[ 
https://issues.apache.org/jira/browse/IMPALA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068920#comment-17068920
 ] 

ASF subversion and git services commented on IMPALA-8005:
---------------------------------------------------------

Commit df6196e064bc7453bee8c7e644bb591391ee3ce2 in impala's branch 
refs/heads/master from Anurag Mantripragada
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=df6196e ]

IMPALA-8005: Randomize partitioning exchanges.

Currently, we use the same hash seed for partitioning exchanges at
the sender. For a table with skew in distribution in the shuffling
keys, multiple queries using the same shuffling keys for exchanges
will end up hashing to the same destination fragments running on
a particular host and potentially overloading that host.

This patch seeds the hash with query id. This will ensure that
the partitioning exchanges do not always hash to the
same destination with same shuffling keys.

Testing:
Added a test to data-stream-test to verify the data values at
destination are different for different queries.

Change-Id: I1936e6cc3e8d66420a5a9301f49221ca38f3e468
Reviewed-on: http://gerrit.cloudera.org:8080/15497
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Randomize partitioning exchanges destinations
> ---------------------------------------------
>
>                 Key: IMPALA-8005
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8005
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 3.1.0
>            Reporter: Michael Ho
>            Assignee: Anurag Mantripragada
>            Priority: Major
>              Labels: ramp-up
>
> Currently, we use the same hash seed for partitioning exchanges at the 
> sender. For a table with skew in distribution in the shuffling keys, multiple 
> queries using the same shuffling keys for exchanges will end up hashing to 
> the same destination fragments running on particular host and potentially 
> overloading that host.
> We should consider using the query id or other query specific information to 
> seed the hashing function to randomize the destinations for different 
> queries. Thanks to [~tlipcon] for pointing this problem out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to