[
https://issues.apache.org/jira/browse/IMPALA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fifteen updated IMPALA-10342:
-----------------------------
Description:
By default, when encounting error, both `get_json_object()` and
`DecimalOperators::IntToDecimalVal` will raise warning.
During to their stateless nature, The warning flood will easily overwhelm
cluster's processing capacity.
To be specific, we have observed these bottlenecks:
*Exchange Receiver*: the default value for `rpc_max_message_size` is 50MB.
The flooding warning messages carried by ReportExecStatusPB may exceed that
limit, causing profile-less status report. Or, if the report message size is
somehow under the limit, the bandwidth consumption is also non-trivial.
*Storage:* like IMPALA-5256 , flooding warnings produce huge log files since
`stdout/stderr` won't be redirected when glog is rolling logs. Under this
circumstance, we had enough of clearing log files and restarting executors.
*Coordinator*: runtime profiles will be serialized to thrift and stored in
Coordinator's memory. The warning flood will make `Untracked Memory` rising
rapidly. I have made a heap profile(with pprof) and found most memory were used
by RuntimeProfile and Strings.
*Imperfect Solution:*
We suffered a lot from this problem, and we have came out with an Imperfect
solution.
# We have a straightforward solution by muting the AddWarning()
# Introduced a query option to re-enable the warning when needed.
*Testing:*
With muted warning messages, we find the burden of C nodes is highly alleviated
and heap profiles no longer bound to RuntimeProfile.
We are looking forward for a *better direction* from community, thanks
was:
By default, when encounting error, both `get_json_object()` and
`DecimalOperators::IntToDecimalVal` will raise warning. During to their
stateless nature, functions keep throwing and throwing. The warning flood will
easily overwhelm cluster's processing capacity.
To be specific, we have observed these bottlenecks:
*Exchange Receiver*: the default value for `rpc_max_message_size` is 50MB.
The flooding warning messages carried by ReportExecStatusPB may exceed that
limit, causing profile-less status report. Or, if the report message size is
somehow under the limit, the bandwidth consumption is also non-trivial.
*Storage:* like IMPALA-5256 , flooding warnings produce huge log files since
`stdout/stderr` won't be redirected when glog is rolling logs. Under this
circumstance, we had enough of clearing log files and restarting executors.
*Coordinator*: runtime profiles will be serialized to thrift and stored in
Coordinator's memory. The warning flood will make `Untracked Memory` rising
rapidly. I have made a heap profile(with pprof) and found most memory were used
by RuntimeProfile and Strings.
!image-2020-11-19-17-30-22-918.png!
*Imperfect Solution:*
We suffered a lot from this problem, and we have came out with an Imperfect
solution.
# We have a straightforward solution by muting the AddWarning()
# Introduced a query option to re-enable the warning when needed.
*Testing:*
With muted warning messages, we find the burden of C nodes is highly alleviated
and heap profiles no longer bound to RuntimeProfile.
We are looking forward for a *better direction* from community, thanks
> A way to alleviate congestion caused by row-level warnings
> -----------------------------------------------------------
>
> Key: IMPALA-10342
> URL: https://issues.apache.org/jira/browse/IMPALA-10342
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Fifteen
> Priority: Major
> Attachments: image-2020-11-19-17-30-22-918.png,
> image-2020-11-23-09-57-49-840.png, impalad-ram-profile.pdf
>
>
> By default, when encounting error, both `get_json_object()` and
> `DecimalOperators::IntToDecimalVal` will raise warning.
> During to their stateless nature, The warning flood will easily overwhelm
> cluster's processing capacity.
> To be specific, we have observed these bottlenecks:
> *Exchange Receiver*: the default value for `rpc_max_message_size` is 50MB.
> The flooding warning messages carried by ReportExecStatusPB may exceed that
> limit, causing profile-less status report. Or, if the report message size is
> somehow under the limit, the bandwidth consumption is also non-trivial.
> *Storage:* like IMPALA-5256 , flooding warnings produce huge log files since
> `stdout/stderr` won't be redirected when glog is rolling logs. Under this
> circumstance, we had enough of clearing log files and restarting executors.
> *Coordinator*: runtime profiles will be serialized to thrift and stored in
> Coordinator's memory. The warning flood will make `Untracked Memory` rising
> rapidly. I have made a heap profile(with pprof) and found most memory were
> used by RuntimeProfile and Strings.
>
>
> *Imperfect Solution:*
> We suffered a lot from this problem, and we have came out with an Imperfect
> solution.
> # We have a straightforward solution by muting the AddWarning()
> # Introduced a query option to re-enable the warning when needed.
> *Testing:*
> With muted warning messages, we find the burden of C nodes is highly
> alleviated and heap profiles no longer bound to RuntimeProfile.
>
> We are looking forward for a *better direction* from community, thanks
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]