[
https://issues.apache.org/jira/browse/IMPALA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032771#comment-17032771
]
Joe McDonnell commented on IMPALA-8005:
---------------------------------------
It looks like the code for this revolves around EXCHANGE_HASH_SEED in
krpc-data-stream-sender.h/.cc:
[https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.h#L253]
Other code is in the KrpcDataStreamSender constructor (see init of channels_)
as well as HashAndAddRows(), HashRow(), and AddRowToChannel().
> Randomize partitioning exchanges destinations
> ---------------------------------------------
>
> Key: IMPALA-8005
> URL: https://issues.apache.org/jira/browse/IMPALA-8005
> Project: IMPALA
> Issue Type: Improvement
> Components: Distributed Exec
> Affects Versions: Impala 3.1.0
> Reporter: Michael Ho
> Assignee: Anurag Mantripragada
> Priority: Major
> Labels: ramp-up
>
> Currently, we use the same hash seed for partitioning exchanges at the
> sender. For a table with skew in distribution in the shuffling keys, multiple
> queries using the same shuffling keys for exchanges will end up hashing to
> the same destination fragments running on particular host and potentially
> overloading that host.
> We should consider using the query id or other query specific information to
> seed the hashing function to randomize the destinations for different
> queries. Thanks to [~tlipcon] for pointing this problem out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]