Michael Ho has posted comments on this change. Change subject: IMPALA-3286: Prefetching for PHJ probing. ......................................................................
Patch Set 1: (10 comments) Patch set 2 will cache the evaluated expressions' values and hash values. This is needed to avoid repeated probing expression evaluation which can be very expensive. A lot of the cached state has been moved into HashTableCtx. http://gerrit.cloudera.org:8080/#/c/2959/1/be/src/exec/hash-table.inline.h File be/src/exec/hash-table.inline.h: Line 136: inline void HashTable::PrefetchBucketData(int64_t bucket_idx) { > It seems to me like the extra overhead here might hurt in some cases - ther This is removed in the new patch. Line 140: UNLIKELY > It seems like this is workload-dependent rather than always unlikely. E.g. This is removed in the new patch. Line 153: inline void HashTable::PrefetchBucket(uint32_t hash) { > With multiple overloads only differentiated by integer type, it's too easy Down to two functions in the new patch. http://gerrit.cloudera.org:8080/#/c/2959/1/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 321: // Prefetch the hash table buckets. > If it turns out that this slows down probing of small hash tables, we could Removed in the new patch. Line 323: hash_tbl_iterator_.PrefetchBucket<true>(); > Maybe a brief comment to explain why it's necessary to prefetch this separa Removed in the new patch. Line 343: if (LIKELY(hash_tbl != NULL)) hash_tbl->PrefetchBucketData<true>(hash); > We talked about this a little offline, but prefetching the bucket data is a Removed in the new patch. Prefetching the bucket data doesn't seem to buys us much. http://gerrit.cloudera.org:8080/#/c/2959/1/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 251: uint32_t** cur_hash_value, int* remaining_capacity, > Mention that this increments cur_hash_value to line up with probe_batch_ite Not needed in the new patch. Line 417: /// computed during prefetching. > Document that these are sized to the RuntimeState's batch_size Removed in the new patch. Line 418: std::vector<uint32_t> hash_values_; > Mention that the indices line up with current_probe_batch_ (I think that's Removed in the new patch. Line 421: boost::scoped_ptr<Bitmap> null_bitmap_; > Same here. Removed in the new patch. -- To view, visit http://gerrit.cloudera.org:8080/2959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
