weijietong edited a comment on issue #1662: DRILL-6825: apply different hash 
algorithms to different data types
URL: https://github.com/apache/drill/pull/1662#issuecomment-469987352
 
 
   @Ben-Zvi  I have done a benchmark about IntegerHash and MurmurHash3_32. The 
result shows that IntegerHash has nearly 2 times performance than 
MurmurHash3_32.
   ```
   HashBenchInteger.hashInteger  IntegerHash       N/A  avgt    5   3.987 ± 
0.162  ns/op
   HashBenchInteger.hashInteger   Murmur3_32       N/A  avgt    5   6.626 ± 
0.085  ns/op
   HashBenchInteger.hashLong     IntegerHash       N/A  avgt    5   4.903 ± 
0.320  ns/op
   HashBenchInteger.hashLong      Murmur3_32       N/A  avgt    5   8.525 ± 
0.649  ns/op
   ```
   The city_1_1 hash algorithm mentioned above only has a better performance 
than Murmur3_32 but not good than IntegerHash.
   
   I also run two sql queries comparison which are using IntegerHash and 
MurmurHash separately to the long data types. The queries are hashing hotspot:
   Q1:
   ```
   select c.c_custkey, count(*)
   from dfs.`/tpch100/customer` c
   group by c.c_custkey
   ```
   Q2:
   ```
   select c.c_custkey,c.c_nationkey, count(*)
   from dfs.`/tpch100/customer` c
   group by c.c_custkey,c.c_nationkey
   ```
   The dataset is tpc-h scale 100.  The query result shows that : 
   To Q1:
   IntegerHash has a 5% query performance improvement than 
MurmurHash3_32(IntegerHash: 14.029 sec, MurmurHash3_32: 14.870 sec ).
   To Q2:
   IntegerHash has a 16% query performance improvement than 
MurmurHash3_32(IntegerHash: 20.499 sec,MurmurHash3_32: 23.921 sec).
   
   It is clear that the more integer datatype columns grouped by, the more 
query performance will be gained.
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to