Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: Software prefetching for hash table build.
......................................................................


Patch Set 2:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/2896/2//COMMIT_MSG
Commit Message:

Line 17:  
> evaluation
Done


Line 18: is empty.
> or the hash value doesn't match, right?
Done


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 183: EvalBuildRow
> does this call get replaced by codegen?  doesn't seem so given my comment i
The code is generated by CodegenEquals() which was passed the pointer to the 
runtime generated 'eval_row_fn'. Therefore, it's calling the eval_row_fn 
already in the generated code of Equals(). 'eval_row_fn' has an 'always-inline' 
attribute so it will be inlined during the inlining pass.


Line 186:   for (int i = 0; i < build_expr_ctxs_.size(); ++i) {
> not this patch, but this loop looks like a codegen candidate.
I believe it's already being codegen'ed by CodegenEquals().


Line 868:     builder.CreateCondBr(need_eval, need_eval_block, skip_eval_block);
> why do we need this cmp/br?  can't we know at codegen time whether we're us
It's possible that we call Equals() multiple times during probing but we only 
need to lazily evaluate the expression once. If we don't have cmp/br, we may 
end up evaluating the same expression on the same row multiple times.


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 250: functions
> what other functions is this comment talking about?
Not sure. I just moved this declaration to the right place in the code without 
changing its content. I think it means 'function'.


Line 294: expr_values_row_field_idx
> Is there no way to do this based on the field name and the types?  this is 
I tried looking up other ways but I am not sure if there is a way to do so. 
Anyway, this field has been removed in the new patch.


Line 407:  
> the
Done


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 310: get
> see my comment in the header file. if you make this hash_values_.data(), th
Done


Line 312:   RowBatch* prefetch_batch = batch;
> why this renaming?
To avoid conflicting with the name of the iterator below. The new patch has an 
updated version of this macro which works around this problem.


Line 314: !
> easier to read without the ! and just reverse the if-else.
Done


Line 318: Evaluate the effective prefetch batch size
> what is this saying? Figure out the optimal batch size for prefetching?
Yes. Comment updated.


Line 332: empty
> or the hash value mismatches, no? don't we always compare hash values befor
Yes. Comment updated.


Line 335:       ht_ctx->SetRowToEval(row);
> rather than setting this in ht_ctx, why not pass it as a paramter all the w
Yes, we can pass the row through. I kept it in ht_ctx to be consistent with the 
way we keep expr_values_buffer_ in it. Another complication is that we need a 
way to indicate whether lazy evaluation is needed (which is why we set 
expr_values_row_ to NULL after lazy evaluation occurs) and that's also why we 
need to reset the state below in case Equals() is not called.

That said, we can do better. The new patch will pass TupleRow** to Equals() 
which will be set to NULL after lazy evaluation. TupleRow** is pointing to a 
variable on the stack so the scope is limited to the function Probe().


Line 344:   ht_ctx->SetRowToEval(NULL);
> doesn't Equals() do this?  why is this needed?
Not necessarily if Equals() is not called.


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 338: (state->batch_size() + 7) / 8
> BitUtil::Ceil()?
Done


Line 1606: 1
> why doesn't this need updating (now that their's a call inside Equals())?
I think you meant to ask about the DCHECK in CodegenInsertBatch(). The answer 
is that we directly pass eval_row_fn to CodegenEquals() instead of relying on 
ReplaceCallSites() to replace the call site.


http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 478:  
> is an array containing
Done


Line 510: scoped_ptr
> I don't think this works correct.  Doesn't it need to be scoped_array<> so 
Good point. Converted to using vector.


Line 513: scoped_ptr
> same. also, is there a good reason to not use Bitmap class?
Not really. Just didn't know such utility already existed.


-- 
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to