Hi team, Recently I found a serious performance bug for the current hash join/groupby implementation [1] and submitted a fix for it [2]. The damage caused by this bug depends on the query itself and the underlying data, but for a specific join query TPC-H Q5 whose initial query time was very strange, *the query time was reduced from ~3500s to ~1000s on 16 nodes after applying this fix*.
When we perform hash join/groupby, keys are first hash partitioned into each NC partition (hash1(key)%P, where P is the number of NC partitions). Within each NC partition, a hash table is built by hashing each key to a slot (hash2(key)%N, where N is the number of slots). The key problem is that we used the same hash function for both hash1 and hash2! In general, this may lead to a lot of hash collisions. To see this problem, consider what happens with NC partition 0. We know that all keys assigned to NC partition 0 must satisfy hash(key) % P == 0. Now suppose we have 16 NC partitions (P = 16) and N is a multiple of 4. Since hash(key)%16 == 0, we know that hash(key)%N must be a multiple of 4! This means all slots that are multiple of 1,2,3 will be empty, and all keys will be clustered into slots that are multiples of 4. To fix this problem, we can simply use a different hash function for hash join/groupby. If you are running experiments related to join queries and have seen some unexpected performance results, it'll be helpful to try this fix to see what happens. Best regards, Chen Luo [1] https://issues.apache.org/jira/browse/ASTERIXDB-2783 [2] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8123
