Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: Software prefetching for hash table build.
......................................................................


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 544: row_
> I believe this is an alias for scratch_row_. If that's correct, let's renam
Done


Line 578: If EvalAndHashBuild() and EvalAndHashProbe()
        :   /// aren't called before this function,
> this is kind of misleading cause it won't happen automatically -- the calle
Comment has been rephrased as suggested.


http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 311:   uint32_t* cur_hash_value = hash_values_.data();
> DCHECK_LE(batch->num_rows(), hash_values_.size());
Done


Line 318: hash_values_.data()
> this probably results in a load. you could either hoist it, or just iterate
Used .data() instead in the new code.


Line 332:       null_bitmap_.Set<false>(i, false);
> why do we do this? are we just resetting it for the next batch? if that's t
Yes, that's the idea.I was trying to avoid some cache misses from memset but 
that was moot as we always access the bitmap in the second loop anyway.


http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 509:  
> of each row for the current batch
Done


Line 512: be
> delete
Done


Line 512:  
> for the current batch
Done


http://gerrit.cloudera.org:8080/#/c/2896/7/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 400: /// To access the current row, use 'iter.Get()'. This macro cannot be 
nested.
> Why this change?  In any case, if you really want to expose the iterator, y
The way it was written before requires that '_row_batch' has different names if 
multiple of this macro is used in the same scope.

I tried limiting the scope of the iterator by doing:

for (RowBatch::Iterator iter(...), TupleRow* row= ...; ....)

but apparently this won't compile. We can work around that with pattern like:

FOREACH_ROW(...) {

} ROFEACH_ROW();

but I think the current approach isn't too bad either. The macro is updated as 
you suggested to allow callers to specify the name of the iterator.


-- 
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to