[ 
https://issues.apache.org/jira/browse/IMPALA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fifteen updated IMPALA-10342:
-----------------------------
    Description: 
By default, when encounting error, both `get_json_object()` and 
`DecimalOperators::IntToDecimalVal` will raise warning.

During to their stateless nature, The warning flood will easily overwhelm 
cluster's processing capacity.

To be specific, we have observed these bottlenecks:

*Exchange Receiver*:   the default value for `rpc_max_message_size` is 50MB. 
The flooding warning messages carried by ReportExecStatusPB may exceed that 
limit, causing profile-less status report. Or,  if the report message size is 
somehow under the limit, the bandwidth consumption is also non-trivial.

*Storage:* like IMPALA-5256 , flooding warnings produce huge log files since 
`stdout/stderr` won't be redirected when glog is rolling logs.  Under this 
circumstance, we had enough of clearing log files and restarting executors. 

*Coordinator*: runtime profiles will be serialized to thrift and stored in 
Coordinator's memory. The warning flood will make `Untracked Memory` rising 
rapidly. I have made a heap profile(with pprof) and found most memory were used 
by RuntimeProfile and Strings. 

  !image-2020-11-23-09-57-49-840.png!

 

*1 preliminary Solution:*

We suffered a lot from this problem, and we have came out with an preliminary 
solution. 
 # We have a straightforward solution by muting the AddWarning()
 # Introduced a query option to re-enable the warning when needed.

 *Testing:*

With muted warning messages, we find the burden of C nodes is highly alleviated 
and heap profiles no longer bound to RuntimeProfile.

 

We are looking forward for a *better direction* from community, thanks~

 

  was:
By default, when encounting error, both `get_json_object()` and 
`DecimalOperators::IntToDecimalVal` will raise warning.

During to their stateless nature, The warning flood will easily overwhelm 
cluster's processing capacity.

To be specific, we have observed these bottlenecks:

*Exchange Receiver*:   the default value for `rpc_max_message_size` is 50MB. 
The flooding warning messages carried by ReportExecStatusPB may exceed that 
limit, causing profile-less status report. Or,  if the report message size is 
somehow under the limit, the bandwidth consumption is also non-trivial.

*Storage:* like IMPALA-5256 , flooding warnings produce huge log files since 
`stdout/stderr` won't be redirected when glog is rolling logs.  Under this 
circumstance, we had enough of clearing log files and restarting executors. 

*Coordinator*: runtime profiles will be serialized to thrift and stored in 
Coordinator's memory. The warning flood will make `Untracked Memory` rising 
rapidly. I have made a heap profile(with pprof) and found most memory were used 
by RuntimeProfile and Strings. 

 

 

*Imperfect Solution:*

We suffered a lot from this problem, and we have came out with an Imperfect 
solution. 
 # We have a straightforward solution by muting the AddWarning()
 # Introduced a query option to re-enable the warning when needed.

 *Testing:*

With muted warning messages, we find the burden of C nodes is highly alleviated 
and heap profiles no longer bound to RuntimeProfile.

 

We are looking forward for a *better direction* from community, thanks

 


> A way to alleviate congestion caused by row-level warnings 
> -----------------------------------------------------------
>
>                 Key: IMPALA-10342
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10342
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Fifteen
>            Priority: Major
>         Attachments: image-2020-11-19-17-30-22-918.png, 
> image-2020-11-23-09-57-49-840.png, impalad-ram-profile.pdf
>
>
> By default, when encounting error, both `get_json_object()` and 
> `DecimalOperators::IntToDecimalVal` will raise warning.
> During to their stateless nature, The warning flood will easily overwhelm 
> cluster's processing capacity.
> To be specific, we have observed these bottlenecks:
> *Exchange Receiver*:   the default value for `rpc_max_message_size` is 50MB. 
> The flooding warning messages carried by ReportExecStatusPB may exceed that 
> limit, causing profile-less status report. Or,  if the report message size is 
> somehow under the limit, the bandwidth consumption is also non-trivial.
> *Storage:* like IMPALA-5256 , flooding warnings produce huge log files since 
> `stdout/stderr` won't be redirected when glog is rolling logs.  Under this 
> circumstance, we had enough of clearing log files and restarting executors. 
> *Coordinator*: runtime profiles will be serialized to thrift and stored in 
> Coordinator's memory. The warning flood will make `Untracked Memory` rising 
> rapidly. I have made a heap profile(with pprof) and found most memory were 
> used by RuntimeProfile and Strings. 
>   !image-2020-11-23-09-57-49-840.png!
>  
> *1 preliminary Solution:*
> We suffered a lot from this problem, and we have came out with an preliminary 
> solution. 
>  # We have a straightforward solution by muting the AddWarning()
>  # Introduced a query option to re-enable the warning when needed.
>  *Testing:*
> With muted warning messages, we find the burden of C nodes is highly 
> alleviated and heap profiles no longer bound to RuntimeProfile.
>  
> We are looking forward for a *better direction* from community, thanks~
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to