[
https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-4268:
----------------------------------
Summary: Rework coordinator buffering to buffer more data (was: buffer
more than a batch of rows at coordinator)
> Rework coordinator buffering to buffer more data
> ------------------------------------------------
>
> Key: IMPALA-4268
> URL: https://issues.apache.org/jira/browse/IMPALA-4268
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.8.0
> Reporter: Henry Robinson
> Assignee: Bikramjeet Vig
> Priority: Major
> Labels: query-lifecycle, resource-management
> Attachments: rows-produced-histogram.png
>
>
> In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the
> production of output rows at the root of a plan.
> The implementation in IMPALA-2905 has the plan execute in a separate thread
> to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the
> sender thread will block until {{GetNext()}} is called, so that there are no
> complications about memory usage and ownership due to having several batches
> in flight at one time.
> However, this also leads to many context switches, as each {{GetNext()}} call
> yields to the sender to produce the rows. If the sender was to fill a buffer
> asynchronously, the consumer could pull out of that buffer without taking a
> context switch in many cases (and the extra buffering might smooth out any
> performance spikes due to client delays, which currently directly affect plan
> execution).
> The tricky part is managing the mismatch between the size of the row batches
> processed in {{Send()}} and the size of the fetch result asked for by the
> client. The sender materializes output rows in a {{QueryResultSet}} that is
> owned by the coordinator. That is not, currently, a splittable object -
> instead it contains the actual RPC response struct that will hit the wire
> when the RPC completes. As asynchronous sender cannot know the batch size,
> which may change on every fetch call. So the {{GetNext()}} implementation
> would need to be able to split out the {{QueryResultSet}} to match the
> correct fetch size, and handle stitching together other {{QueryResultSets}} -
> without doing extra copies.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]