Michael Ho has posted comments on this change. Change subject: IMPALA-3286: Software prefetching for hash table build. ......................................................................
Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 544: row_ > I believe this is an alias for scratch_row_. If that's correct, let's renam Done Line 578: If EvalAndHashBuild() and EvalAndHashProbe() : /// aren't called before this function, > this is kind of misleading cause it won't happen automatically -- the calle Comment has been rephrased as suggested. http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 311: uint32_t* cur_hash_value = hash_values_.data(); > DCHECK_LE(batch->num_rows(), hash_values_.size()); Done Line 318: hash_values_.data() > this probably results in a load. you could either hoist it, or just iterate Used .data() instead in the new code. Line 332: null_bitmap_.Set<false>(i, false); > why do we do this? are we just resetting it for the next batch? if that's t Yes, that's the idea.I was trying to avoid some cache misses from memset but that was moot as we always access the bitmap in the second loop anyway. http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 509: > of each row for the current batch Done Line 512: be > delete Done Line 512: > for the current batch Done http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 400: /// To access the current row, use 'iter.Get()'. This macro cannot be nested. > Why this change? In any case, if you really want to expose the iterator, y The way it was written before requires that '_row_batch' has different names if multiple of this macro is used in the same scope. I tried limiting the scope of the iterator by doing: for (RowBatch::Iterator iter(...), TupleRow* row= ...; ....) but apparently this won't compile. We can work around that with pattern like: FOREACH_ROW(...) { } ROFEACH_ROW(); but I think the current approach isn't too bad either. The macro is updated as you suggested to allow callers to specify the name of the iterator. -- To view, visit http://gerrit.cloudera.org:8080/2896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
