Chen: Great! Several yeas ago, I was comparing AsterixDB with Hive/Tez&Spark, and noticed threads within each node were highly unbalanced. I think your answer would be the answer. Best!
> -----原始邮件----- > 发件人: "Chen Luo" <[email protected]> > 发送时间: 2020-09-27 01:15:43 (星期日) > 收件人: [email protected], [email protected] > 抄送: > 主题: Serious Hash Collisions in Hash Join/Groupby > > Hi team, > > Recently I found a serious performance bug for the current hash > join/groupby implementation [1] and submitted a fix for it [2]. The damage > caused by this bug depends on the query itself and the underlying data, but > for a specific join query TPC-H Q5 whose initial query time was very > strange, *the query time was reduced from ~3500s to ~1000s on 16 nodes > after applying this fix*. > > When we perform hash join/groupby, keys are first hash partitioned into > each NC partition (hash1(key)%P, where P is the number of NC partitions). > Within each NC partition, a hash table is built by hashing each key to a > slot (hash2(key)%N, where N is the number of slots). The key problem is > that we used the same hash function for both hash1 and hash2! In general, > this may lead to a lot of hash collisions. To see this problem, consider > what happens with NC partition 0. We know that all keys assigned to NC > partition 0 must satisfy hash(key) % P == 0. Now suppose we have 16 NC > partitions (P = 16) and N is a multiple of 4. Since hash(key)%16 == 0, we > know that hash(key)%N must be a multiple of 4! This means all slots that > are multiple of 1,2,3 will be empty, and all keys will be clustered into > slots that are multiples of 4. To fix this problem, we can simply use a > different hash function for hash join/groupby. > > If you are running experiments related to join queries and have seen some > unexpected performance results, it'll be helpful to try this fix to see > what happens. > > Best regards, > Chen Luo > > [1] https://issues.apache.org/jira/browse/ASTERIXDB-2783 > [2] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8123
