Dan Hecht has posted comments on this change.

Change subject: IMPALA-3286: Software prefetching for hash table build.
......................................................................


Patch Set 2:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/2896/2//COMMIT_MSG
Commit Message:

Line 17:  
evaluation


Line 18: is empty.
or the hash value doesn't match, right?


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 183: EvalBuildRow
does this call get replaced by codegen?  doesn't seem so given my comment in 
partitioned-hash-join.cc.


Line 186:   for (int i = 0; i < build_expr_ctxs_.size(); ++i) {
not this patch, but this loop looks like a codegen candidate.


Line 868:     builder.CreateCondBr(need_eval, need_eval_block, skip_eval_block);
why do we need this cmp/br?  can't we know at codegen time whether we're using 
lazy expr evaluation or not? i.e. isn't it already the case that if eval_row_fn 
!= NULL then at runtime we'll be using lazy evaluation?


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 250: functions
what other functions is this comment talking about?


Line 294: expr_values_row_field_idx
> all caps since it's a constant.
Is there no way to do this based on the field name and the types?  this is 
fragile.


Line 407:  
the


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 310: get
see my comment in the header file. if you make this hash_values_.data(), the 
loop code would be equivalent.

If you do that, how about reverting the RowIdx parameter change, and then doing 
the .data() on that here too.


Line 312:   RowBatch* prefetch_batch = batch;
why this renaming?


Line 314: !
easier to read without the ! and just reverse the if-else.


Line 318: Evaluate the effective prefetch batch size
what is this saying? Figure out the optimal batch size for prefetching?


Line 332: empty
or the hash value mismatches, no? don't we always compare hash values before 
calling Equals()?


Line 335:       ht_ctx->SetRowToEval(row);
rather than setting this in ht_ctx, why not pass it as a paramter all the way 
through to Equals()?  That way, there's no state that needs to be reset.  Does 
that not work out nicely in some way?

Once you also do prefetching for probe and agg, won't lazy evaluation be the 
only way?


Line 344:   ht_ctx->SetRowToEval(NULL);
doesn't Equals() do this?  why is this needed?


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 338: (state->batch_size() + 7) / 8
BitUtil::Ceil()?


Line 1606: 1
why doesn't this need updating (now that their's a call inside Equals())?


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 478:  
is an array containing


Line 510: scoped_ptr
I don't think this works correct.  Doesn't it need to be scoped_array<> so that 
delete[] is called?

But in any case, why not make it a vector. It should be the same level of 
indirection (once), and if for some reason the generated code is not as good, 
you can still get the array using data().


Line 513: scoped_ptr
same. also, is there a good reason to not use Bitmap class?


-- 
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to