[
https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895194#comment-15895194
]
Kunal Khatua commented on DRILL-5293:
-------------------------------------
To evaluate the performance of the change, I took the two largest tables in a
Parquet-based TPCH benchmark - Lineitem and Orders to run the following query
on a single node with _planner.width.max_per_node=32_:
{code:SQL}select
sum(l.l_extendedprice * (1 - l.l_discount)) as revenue
from
orders o,
lineitem l
where
l.l_orderkey = o.o_orderkey;{code}
Looking at the cold cache runtimes (in msec) for this (averaging 10 runs), we
got this:
| *Runtime* | *Baseline* | *Patch* | *%gain* |
| _*Avg*_ | 140,568 | 121,995 | 15.22% |
| _*Min*_ | 137,916 | 118,554 | 16.33% |
| _*Max*_ | 142,753 | 123,711 | 15.39% |
When inspecting the individual HashJoin operators (for the above min and max
runtime examples) that processed 750M rows (600M from Lineitem; 150M from
orders); the HashJoin is operating more than 3x faster
Processing(CPU) Time (in sec) for HashJoin (01-xx-04)
| *01-xx-04* | *Baseline* | *Patch* | *ScaleFactor* |
| _*Min Runtime*_ | 30.004 | 8.977 | 3.34 |
| _*Max Runtime*_ | 30.143 | 9.222 | 3.27 |
Based on this, we can commit the patch.
+1 from my end
> Poor performance of Hash Table due to same hash value as distribution below
> ---------------------------------------------------------------------------
>
> Key: DRILL-5293
> URL: https://issues.apache.org/jira/browse/DRILL-5293
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Codegen
> Affects Versions: 1.8.0
> Reporter: Boaz Ben-Zvi
> Assignee: Boaz Ben-Zvi
> Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The computation of the hash value is basically the same whether for the Hash
> Table (used by Hash Agg, and Hash Join), or for distribution of rows at the
> exchange. As a result, a specific Hash Table (in a parallel minor fragment)
> gets only rows "filtered out" by the partition below ("upstream"), so the
> pattern of this filtering leads to a non uniform usage of the hash buckets in
> the table.
> Here is a simplified example: An exchange partitions into TWO (minor
> fragments), each running a Hash Agg. So the partition sends rows of EVEN hash
> values to the first, and rows of ODD hash values to the second. Now the first
> recomputes the _same_ hash value for its Hash table -- and only the even
> buckets get used !! (Or with a partition into EIGHT -- possibly only one
> eighth of the buckets would be used !! )
> This would lead to longer hash chains and thus a _poor performance_ !
> A possible solution -- add a distribution function distFunc (only for
> partitioning) that takes the hash value and "scrambles" it so that the
> entropy in all the bits effects the low bits of the output. This function
> should be applied (in HashPrelUtil) over the generated code that produces the
> hash value, like:
> distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) );
> Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 (
> planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of
> their buckets, the others used 1/4 of their buckets. Maybe the reason for
> this variance is that distribution is using "hash32AsDouble" and hash agg is
> using "hash32".
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)