[
https://issues.apache.org/jira/browse/IMPALA-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yida Wu reassigned IMPALA-5705:
-------------------------------
Assignee: Yida Wu
> Parallelise read I/O by prefetching pages when iterating over unpinned
> BufferedTupleStream
> ------------------------------------------------------------------------------------------
>
> Key: IMPALA-5705
> URL: https://issues.apache.org/jira/browse/IMPALA-5705
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Affects Versions: Impala 2.10.0
> Reporter: Tim Armstrong
> Assignee: Yida Wu
> Priority: Major
>
> We could improve read I/O performance when iterating over unpinned streams in
> the hash join and hash aggregation by using additional memory to prefetch
> pages ahead of the current read position. Currently iterating over the
> unpinned stream only uses a single buffer, and only issues a read I/O when it
> has finished processing the previous page.
> This slows down processing of spilled probe rows in the hash join and spilled
> unaggregated rows in the hash aggregation.
> We'd need to figure out how to expose this in the BufferedTupleStream
> interface, but probably when preparing to read a stream, the client could
> specify a number of bytes to read ahead in the stream, which would require
> additional memory but increase performance.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]