[
https://issues.apache.org/jira/browse/IMPALA-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith reassigned IMPALA-12594:
--------------------------------------
Assignee: Csaba Ringhofer
> KrpcDataStreamSender's mem estimate is different than real usage
> ----------------------------------------------------------------
>
> Key: IMPALA-12594
> URL: https://issues.apache.org/jira/browse/IMPALA-12594
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Frontend
> Reporter: Csaba Ringhofer
> Assignee: Csaba Ringhofer
> Priority: Major
> Fix For: Impala 4.5.0
>
>
> IMPALA-6684 added memory estimates for KrpcDataStreamSender's, but there are
> few gaps between the how the frontend estimates memory and how the backend
> actually allocates it:
> The frontend uses the following formula:
> buffer_size = num_channels * 2 * (tuple_buffer_length +
> compressed_buffer_length)
> This takes account for the serialization and compression buffer for each
> OutboundRowBatch.
> This can both under and over estimate:
> 1. it doesn't take account of the RowBatch used by channels during
> partitioned exchange to collect rows belonging to a single channel
> https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L232
> 2.it ignores the adjustment to the RowBatch capacity above based on flag
> data_stream_sender_buffer_size
> https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L379
> This adjustment can both increase or decrease the capacity to have to desired
> total size (16K by defaul).
> Note that the adjustment above ignores var len data, so it can massively
> underestimate in some cases. Meanwhile the frontend logic calculates string
> sizes if stats are present. Ideally both logic would be improved and synced
> to use both data_stream_sender_buffer_size and the string sizes for the
> estimate (I am not sure about collection types).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]