I vaguely remember a similar performance bug years ago -- maybe that was
w.r.t. the hash function(s) used in the series of levels when joining
really big data. There I think we ended up moving through a
family/series of hash functions. Does this fix do the same...? (Just
wondering if there's already other infrastructure in the code for this.)
On 9/26/20 10:15 AM, Chen Luo wrote:
Hi team,
Recently I found a serious performance bug for the current hash
join/groupby implementation [1] and submitted a fix for it [2]. The damage
caused by this bug depends on the query itself and the underlying data, but
for a specific join query TPC-H Q5 whose initial query time was very
strange, *the query time was reduced from ~3500s to ~1000s on 16 nodes
after applying this fix*.
When we perform hash join/groupby, keys are first hash partitioned into
each NC partition (hash1(key)%P, where P is the number of NC partitions).
Within each NC partition, a hash table is built by hashing each key to a
slot (hash2(key)%N, where N is the number of slots). The key problem is
that we used the same hash function for both hash1 and hash2! In general,
this may lead to a lot of hash collisions. To see this problem, consider
what happens with NC partition 0. We know that all keys assigned to NC
partition 0 must satisfy hash(key) % P == 0. Now suppose we have 16 NC
partitions (P = 16) and N is a multiple of 4. Since hash(key)%16 == 0, we
know that hash(key)%N must be a multiple of 4! This means all slots that
are multiple of 1,2,3 will be empty, and all keys will be clustered into
slots that are multiples of 4. To fix this problem, we can simply use a
different hash function for hash join/groupby.
If you are running experiments related to join queries and have seen some
unexpected performance results, it'll be helpful to try this fix to see
what happens.
Best regards,
Chen Luo
[1] https://issues.apache.org/jira/browse/ASTERIXDB-2783
[2] https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/8123