Michael Ho created IMPALA-6783:
----------------------------------
Summary: Rethink the end-to-end queuing at KrpcDataStreamReceiver
Key: IMPALA-6783
URL: https://issues.apache.org/jira/browse/IMPALA-6783
Project: IMPALA
Issue Type: Improvement
Components: Distributed Exec
Affects Versions: Impala 2.12.0
Reporter: Michael Ho
Follow up from IMPALA-6116. We currently bound the memory usage of service
queue and force a RPC to retry if the memory usage exceeds the configured
limit. The deserialization of row batches happen in the context of service
threads. The deserialized row batches are stored in a queue in the receiver and
its memory consumption is bound by FLAGS_exchg_node_buffer_size_bytes.
Exceeding that limit, we will put incoming row batches into a deferred RPC
queue, which will be drained by deserialization threads. This makes it hard to
size the service queues as its capacity may need to grow as the number of nodes
in the cluster grows.
We may need to reconsider the role of service queue: it could just be a
transition queue before KrpcDataStreamMgr routes the incoming row batches to
the appropriate receivers. The actual queuing may happen in the receiver. The
deserialization should always happen in the context of deserialization threads
so the service threads will just be responsible for routing the RPC requests.
This allows us to keep a rather small service queue. Incoming serialized row
batches will always sit in a queue to be drained by deserialization threads. We
may still need to keep a certain number of deserialized row batches around
ready to be consumed. In this way, we can account for the memory consumption
and size the queue based on number of senders and memory budget of a query.
One hurdle is that we need to overcome the undesirable cross-thread allocation
pattern as rpc_context is allocated from service threads but freed by the
deserialization thread.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)