[
https://issues.apache.org/jira/browse/IMPALA-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107601#comment-17107601
]
Tim Armstrong commented on IMPALA-2797:
---------------------------------------
We probably won't do any work on this - this queue is no longer used when
mt_dop > 0
> scanner threads can act like a thundering herd
> ----------------------------------------------
>
> Key: IMPALA-2797
> URL: https://issues.apache.org/jira/browse/IMPALA-2797
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.3.0
> Reporter: Todd Lipcon
> Priority: Major
> Attachments: trace_Tue_Dec_22_2015_2.27.03_PM.json.gz
>
>
> I noticed this issue with the Kudu scan node implementation, but I imagine it
> could happen with HDFS as well:
> - on a big box, we started up 48 scanner threads for a 'SELECT COUNT(*)' query
> - the underlying batches that are read from Kudu return a few million rows
> each (because it's trying to do large IOs to amortize round-trips, and the
> projection is empty for COUNT(*))
> -- the scannerthread chops these Kudu batches into RowBatches of 1000 rows
> each and pushes those onto the RowBatchQueue
> Because each backend IO (scan RPC to Kudu) results in thousands of Impala
> RowBatches, we end up with the main thread pulling "round robin" from all of
> the scanner threads, rather than exhausting one Kudu batch before moving to
> the next. The issue here is that we see the following:
> - when the query starts, 48 threads hammer the kudu server with Scan RPCs
> - the Kudu server is then completely quiet for ~30 seconds while they drain
> their buffers
> - all of the buffers "empty" at basically the same time, and we get another
> herd of IO on the Kudu side.
> It would be preferable to make the RowBatchQueue "unfair" in some way, such
> that the main thread exhausts entire IO buffers at a time, rather than
> pulling little bits from each of the threads in a round-robin fashion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]