Dan Hecht has posted comments on this change.

Change subject: IMPALA-3286: Software prefetching for hash table build.
......................................................................


Patch Set 7:

(9 comments)

Thanks, I think this is simpler.  There's more cleanup we can do but we can do 
it outside of this change.

http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 544: row_
I believe this is an alias for scratch_row_. If that's correct, let's rename it 
too.


Line 578: If EvalAndHashBuild() and EvalAndHashProbe()
        :   /// aren't called before this function,
this is kind of misleading cause it won't happen automatically -- the caller 
needs to decide.  How about just saying:

If 'row' is non-NULL, it will be evaluated once against either the build or 
probe exprs (...) before calling Equals(). If 'row' is NULL, then 
EvalAndHashBuild() or EvalAndHashProbe() must have been already called by the 
caller.


http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 311:   uint32_t* cur_hash_value = hash_values_.data();
DCHECK_LE(batch->num_rows(), hash_values_.size());
DCHECK_LE(batch->num_rows(), null_bitmap_.num_bits());


Line 318: hash_values_.data()
this probably results in a load. you could either hoist it, or just iterate 
over 'i' like you do in the second loop (which isn't bad since 
sizeof(*cur_hash_value) is a constant power of two so LEA instruction).


Line 332:       null_bitmap_.Set<false>(i, false);
why do we do this? are we just resetting it for the next batch? if that's the 
case, wouldn't it be faster to just memset (SetAllBits) it at the end rather 
than bit-by-bit?


http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 509:  
of each row for the current batch


Line 512: be
delete


Line 512:  
for the current batch


http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 400: /// To access the current row, use 'iter.Get()'. This macro cannot be 
nested.
Why this change?  In any case, if you really want to expose the iterator, you 
should let the "caller" name it via a third argument:

FOREACH_ROW(_row_batch, _start_row_idx, _iter)

so that there's less magic (and things like nested loops work).


-- 
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to