Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: Prefetching for PHJ probing.
......................................................................


Patch Set 1:

(10 comments)

Patch set 2 will cache the evaluated expressions' values and hash values. This 
is needed to avoid repeated probing expression evaluation which can be very 
expensive.  A lot of the cached state has been moved into HashTableCtx.

http://gerrit.cloudera.org:8080/#/c/2959/1/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

Line 136: inline void HashTable::PrefetchBucketData(int64_t bucket_idx) {
> It seems to me like the extra overhead here might hurt in some cases - ther
This is removed in the new patch.


Line 140: UNLIKELY
> It seems like this is workload-dependent rather than always unlikely. E.g. 
This is removed in the new patch.


Line 153: inline void HashTable::PrefetchBucket(uint32_t hash) {
> With multiple overloads only differentiated by integer type, it's too easy 
Down to two functions in the new patch.


http://gerrit.cloudera.org:8080/#/c/2959/1/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 321:   // Prefetch the hash table buckets.
> If it turns out that this slows down probing of small hash tables, we could
Removed in the new patch.


Line 323:     hash_tbl_iterator_.PrefetchBucket<true>();
> Maybe a brief comment to explain why it's necessary to prefetch this separa
Removed in the new patch.


Line 343:       if (LIKELY(hash_tbl != NULL)) 
hash_tbl->PrefetchBucketData<true>(hash);
> We talked about this a little offline, but prefetching the bucket data is a
Removed in the new patch. Prefetching the bucket data doesn't seem to buys us 
much.


http://gerrit.cloudera.org:8080/#/c/2959/1/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 251:       uint32_t** cur_hash_value, int* remaining_capacity,
> Mention that this increments cur_hash_value to line up with probe_batch_ite
Not needed in the new patch.


Line 417:   /// computed during prefetching.
> Document that these are sized to the RuntimeState's batch_size
Removed in the new patch.


Line 418:   std::vector<uint32_t> hash_values_;
> Mention that the indices line up with current_probe_batch_ (I think that's 
Removed in the new patch.


Line 421:   boost::scoped_ptr<Bitmap> null_bitmap_;
> Same here.
Removed in the new patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/2959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to