Michael Ho has posted comments on this change. Change subject: IMPALA-3286: Software prefetching for hash table build. ......................................................................
Patch Set 2: (20 comments) http://gerrit.cloudera.org:8080/#/c/2896/2//COMMIT_MSG Commit Message: Line 17: > evaluation Done Line 18: is empty. > or the hash value doesn't match, right? Done http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 183: EvalBuildRow > does this call get replaced by codegen? doesn't seem so given my comment i The code is generated by CodegenEquals() which was passed the pointer to the runtime generated 'eval_row_fn'. Therefore, it's calling the eval_row_fn already in the generated code of Equals(). 'eval_row_fn' has an 'always-inline' attribute so it will be inlined during the inlining pass. Line 186: for (int i = 0; i < build_expr_ctxs_.size(); ++i) { > not this patch, but this loop looks like a codegen candidate. I believe it's already being codegen'ed by CodegenEquals(). Line 868: builder.CreateCondBr(need_eval, need_eval_block, skip_eval_block); > why do we need this cmp/br? can't we know at codegen time whether we're us It's possible that we call Equals() multiple times during probing but we only need to lazily evaluate the expression once. If we don't have cmp/br, we may end up evaluating the same expression on the same row multiple times. http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 250: functions > what other functions is this comment talking about? Not sure. I just moved this declaration to the right place in the code without changing its content. I think it means 'function'. Line 294: expr_values_row_field_idx > Is there no way to do this based on the field name and the types? this is I tried looking up other ways but I am not sure if there is a way to do so. Anyway, this field has been removed in the new patch. Line 407: > the Done http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 310: get > see my comment in the header file. if you make this hash_values_.data(), th Done Line 312: RowBatch* prefetch_batch = batch; > why this renaming? To avoid conflicting with the name of the iterator below. The new patch has an updated version of this macro which works around this problem. Line 314: ! > easier to read without the ! and just reverse the if-else. Done Line 318: Evaluate the effective prefetch batch size > what is this saying? Figure out the optimal batch size for prefetching? Yes. Comment updated. Line 332: empty > or the hash value mismatches, no? don't we always compare hash values befor Yes. Comment updated. Line 335: ht_ctx->SetRowToEval(row); > rather than setting this in ht_ctx, why not pass it as a paramter all the w Yes, we can pass the row through. I kept it in ht_ctx to be consistent with the way we keep expr_values_buffer_ in it. Another complication is that we need a way to indicate whether lazy evaluation is needed (which is why we set expr_values_row_ to NULL after lazy evaluation occurs) and that's also why we need to reset the state below in case Equals() is not called. That said, we can do better. The new patch will pass TupleRow** to Equals() which will be set to NULL after lazy evaluation. TupleRow** is pointing to a variable on the stack so the scope is limited to the function Probe(). Line 344: ht_ctx->SetRowToEval(NULL); > doesn't Equals() do this? why is this needed? Not necessarily if Equals() is not called. http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 338: (state->batch_size() + 7) / 8 > BitUtil::Ceil()? Done Line 1606: 1 > why doesn't this need updating (now that their's a call inside Equals())? I think you meant to ask about the DCHECK in CodegenInsertBatch(). The answer is that we directly pass eval_row_fn to CodegenEquals() instead of relying on ReplaceCallSites() to replace the call site. http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 478: > is an array containing Done Line 510: scoped_ptr > I don't think this works correct. Doesn't it need to be scoped_array<> so Good point. Converted to using vector. Line 513: scoped_ptr > same. also, is there a good reason to not use Bitmap class? Not really. Just didn't know such utility already existed. -- To view, visit http://gerrit.cloudera.org:8080/2896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
