[
https://issues.apache.org/jira/browse/IMPALA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fifteen updated IMPALA-10342:
-----------------------------
Description:
By default, when encounting error, both `get_json_object()` and
`DecimalOperators::IntToDecimalVal` will raise warning. During to their
stateless nature, functions keep throwing messages. Hence the warning flood
will easily overwhelm cluster's processing capacity.
To be specific, we have observed these bottlenecks:
*Exchange Receiver*: the default value for `rpc_max_message_size` is 50MB.
The flooding warning messages carried by ReportExecStatusPB will exceed that
limit, causing report without profile. Even though the report message size is
less than those limit, the bandwidth consumption is non-trivial.
*Storage:* like https://issues.apache.org/jira/browse/IMPALA-5256 , warning
messages produces huge log files since `stdout/stderr` won't be redirected when
glog is rolling log files.
*Coordinator*: runtime profiles will be seriialized to thrift and stored in
Coordinator's memory. The warning flood will make `Untracked Memory` rising
rapidly. I have made a mem sample and found most memory were used by
RuntimeProfile and Strings.
!image-2020-11-19-17-30-22-918.png!
*Imperfect Solution:*
We suffered a lot from this problem, and we have came out with an Imperfect
solution.
# We have a straightforward solution by muting the AddWarning()
# Introduced a query option to re-enable the warning when needed.
We are looking forward for a better solution from community discussions.
was:
By default, when encounting error, both `get_json_object()` and
`DecimalOperators::IntToDecimalVal` will raise warning. During to their
stateless nature, functions keep throwing messages. Hence the warning flood
will easily overwhelm cluster's processing capacity.
To be specific, we have observed these bottlenecks:
*Exchange Receiver*: the default value for `rpc_max_message_size` is 50MB.
The flooding warning messages carried by ReportExecStatusPB will exceed that
limit, causing report without profile. Even though the report message size is
less than those limit, the bandwidth consumption is non-trivial.
*Storage:* like https://issues.apache.org/jira/browse/IMPALA-5256 , warning
messages produces huge log files since `stdout/stderr` won't be redirected when
glog is rolling log files.
*Coordinator*: runtime profiles will be seriialized to thrift and stored in
Coordinator's memory. The warning flood will make `Untracked Memory` rising
rapidly. I have made a mem sample and found most memory were used by
RuntimeProfile and Strings.
!image-2020-11-19-17-30-22-918.png!
Solution:
# We have a straightforward solution by changing `AddWarning()` to `no-op`.
> Alleviating congestion caused by row-level warnings
> ----------------------------------------------------
>
> Key: IMPALA-10342
> URL: https://issues.apache.org/jira/browse/IMPALA-10342
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Fifteen
> Priority: Major
> Attachments: image-2020-11-19-17-30-22-918.png,
> impalad-ram-profile.pdf
>
>
> By default, when encounting error, both `get_json_object()` and
> `DecimalOperators::IntToDecimalVal` will raise warning. During to their
> stateless nature, functions keep throwing messages. Hence the warning flood
> will easily overwhelm cluster's processing capacity.
> To be specific, we have observed these bottlenecks:
> *Exchange Receiver*: the default value for `rpc_max_message_size` is 50MB.
> The flooding warning messages carried by ReportExecStatusPB will exceed that
> limit, causing report without profile. Even though the report message size
> is less than those limit, the bandwidth consumption is non-trivial.
> *Storage:* like https://issues.apache.org/jira/browse/IMPALA-5256 , warning
> messages produces huge log files since `stdout/stderr` won't be redirected
> when glog is rolling log files.
> *Coordinator*: runtime profiles will be seriialized to thrift and stored in
> Coordinator's memory. The warning flood will make `Untracked Memory` rising
> rapidly. I have made a mem sample and found most memory were used by
> RuntimeProfile and Strings.
> !image-2020-11-19-17-30-22-918.png!
>
> *Imperfect Solution:*
> We suffered a lot from this problem, and we have came out with an Imperfect
> solution.
> # We have a straightforward solution by muting the AddWarning()
> # Introduced a query option to re-enable the warning when needed.
>
> We are looking forward for a better solution from community discussions.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]