Re:PHJ node assignment

Quanlong Huang Mon, 12 Feb 2018 03:52:45 -0800

IMU, the left side is always located with the hash join node. If the stats are 
correct, the left side will always be a larger table/input. There're two 
terminologies in the hash join algorithm: build and probe. The smaller table 
that can be built into an in-memory hash table is called the "build" input. 
It's represented at the right side. After the in-memory hash table is built, 
the larger table will be scanned and rows will be probed in the hash table to 
find matched results. The larger table is called the "probe" input and 
represented at the left side.So not all rows are sent across the network to 
perform a hash join. Usually the larger table is scanned locally. Network 
traffic comes from the "build" input. It's smaller and sometimes can even be 
represented as a BloomFilter (one kind of RuntimeFilter in Impala).

However, there's still one case that all rows are sent across the network 
anyway. That is when all tables are not located in the Impala cluster (e.g. 
Impala is deployed in a portion of the Hadoop cluster). Scanning the tables 
both consumes network traffic. However, when performing hash join, the results 
of the right side will be sent to the left side, since they have smaller size 
and consumes less network traffic than sending the left side.

I find this paper in "Impala Reading List" has much more details and deserves 
to be read more times: 
Hash joins and hash teams in Microsoft SQL Server (Graefe, Bunker, Cooper)

HTH

At 2018-02-12 18:13:09, "Jeszy" <jes...@gmail.com> wrote:
>IIUC, every row scanned in a partitioned hash join (both sides) is sent
>across the network (an exchange on HASH(key)). The targets of this exchange
>are nodes that have data locality with the left side of the join. Why does
>Impala do it that way?
>
>Since all rows are sent across the network anyway, Impala could just use
>all the nodes in the cluster. The upside would be better parallelism for
>the join itself as well as for all the operators sitting on top of it. Is
>there a downside I'm forgetting?
>If not, is there a jira tracking this already? Haven't found one.
>
>Thanks!

Re:PHJ node assignment

Reply via email to