[
https://issues.apache.org/jira/browse/IMPALA-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell updated IMPALA-6783:
----------------------------------
Target Version: Impala 4.0 (was: Impala 3.4.0)
> Rethink the end-to-end queuing at KrpcDataStreamReceiver
> --------------------------------------------------------
>
> Key: IMPALA-6783
> URL: https://issues.apache.org/jira/browse/IMPALA-6783
> Project: IMPALA
> Issue Type: Sub-task
> Components: Distributed Exec
> Affects Versions: Impala 2.12.0
> Reporter: Michael Ho
> Priority: Major
>
> Follow up from IMPALA-6116. We currently bound the memory usage of service
> queue and force a RPC to retry if the memory usage exceeds the configured
> limit. The deserialization of row batches happen in the context of service
> threads. The deserialized row batches are stored in a queue in the receiver
> and its memory consumption is bound by FLAGS_exchg_node_buffer_size_bytes.
> Exceeding that limit, we will put incoming row batches into a deferred RPC
> queue, which will be drained by deserialization threads. This makes it hard
> to size the service queues as its capacity may need to grow as the number of
> nodes in the cluster grows.
> We may need to reconsider the role of service queue: it could just be a
> transition queue before KrpcDataStreamMgr routes the incoming row batches to
> the appropriate receivers. The actual queuing may happen in the receiver. The
> deserialization should always happen in the context of deserialization
> threads so the service threads will just be responsible for routing the RPC
> requests. This allows us to keep a rather small service queue. Incoming
> serialized row batches will always sit in a queue to be drained by
> deserialization threads. We may still need to keep a certain number of
> deserialized row batches around ready to be consumed. In this way, we can
> account for the memory consumption and size the queue based on number of
> senders and memory budget of a query.
> One hurdle is that we need to overcome the undesirable cross-thread
> allocation pattern as rpc_context is allocated from service threads but freed
> by the deserialization thread.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]