[
https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-4268:
----------------------------------
Target Version: Impala 3.2.0 (was: Product Backlog)
> Rework coordinator buffering to buffer more data
> ------------------------------------------------
>
> Key: IMPALA-4268
> URL: https://issues.apache.org/jira/browse/IMPALA-4268
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.8.0
> Reporter: Henry Robinson
> Assignee: Pooja Nilangekar
> Priority: Major
> Labels: query-lifecycle, resource-management
> Attachments: rows-produced-histogram.png
>
>
> {{PlanRootSink}} executes the producer thread (the coordinator fragment
> execution thread) in a separate thread to the consumer (i.e. the thread
> handling the fetch RPC), which calls {{GetNext()}} to retrieve the rows. The
> implementation was simplified by handing off a single batch at a time from
> the producers to consumer.
> This decision causes some problems:
> * Many context switches for the sender. Adding buffering would allow the
> sender to append to the buffer and continue progress without a context switch.
> * Query execution can't release resources until the client has fetched the
> final batch, because the coordinator fragment thread is still running and
> potentially producing backpressure all the way down the plan tree.
> * The consumer can't fulfil fetch requests greater than Impala's internal
> BATCH_SIZE, because it is only given one batch at a time.
> The tricky part is managing the mismatch between the size of the row batches
> processed in {{Send()}} and the size of the fetch result asked for by the
> client without impacting performance too badly. The sender materializes
> output rows in a {{QueryResultSet}} that is owned by the coordinator. That is
> not, currently, a splittable object - instead it contains the actual RPC
> response struct that will hit the wire when the RPC completes. As
> asynchronous sender does not know the batch size, because it can in theory
> change on every fetch call (although most reasonable clients will not
> randomly change the fetch size).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]