Dan Hecht has posted comments on this change. Change subject: IMPALA-3286: Software prefetching for hash table build. ......................................................................
Patch Set 7: (9 comments) Thanks, I think this is simpler. There's more cleanup we can do but we can do it outside of this change. http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 544: row_ I believe this is an alias for scratch_row_. If that's correct, let's rename it too. Line 578: If EvalAndHashBuild() and EvalAndHashProbe() : /// aren't called before this function, this is kind of misleading cause it won't happen automatically -- the caller needs to decide. How about just saying: If 'row' is non-NULL, it will be evaluated once against either the build or probe exprs (...) before calling Equals(). If 'row' is NULL, then EvalAndHashBuild() or EvalAndHashProbe() must have been already called by the caller. http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 311: uint32_t* cur_hash_value = hash_values_.data(); DCHECK_LE(batch->num_rows(), hash_values_.size()); DCHECK_LE(batch->num_rows(), null_bitmap_.num_bits()); Line 318: hash_values_.data() this probably results in a load. you could either hoist it, or just iterate over 'i' like you do in the second loop (which isn't bad since sizeof(*cur_hash_value) is a constant power of two so LEA instruction). Line 332: null_bitmap_.Set<false>(i, false); why do we do this? are we just resetting it for the next batch? if that's the case, wouldn't it be faster to just memset (SetAllBits) it at the end rather than bit-by-bit? http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 509: of each row for the current batch Line 512: be delete Line 512: for the current batch http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 400: /// To access the current row, use 'iter.Get()'. This macro cannot be nested. Why this change? In any case, if you really want to expose the iterator, you should let the "caller" name it via a third argument: FOREACH_ROW(_row_batch, _start_row_idx, _iter) so that there's less magic (and things like nested loops work). -- To view, visit http://gerrit.cloudera.org:8080/2896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
