Dan Hecht has posted comments on this change. Change subject: IMPALA-3286: Software prefetching for hash table build. ......................................................................
Patch Set 2: (20 comments) http://gerrit.cloudera.org:8080/#/c/2896/2//COMMIT_MSG Commit Message: Line 17: evaluation Line 18: is empty. or the hash value doesn't match, right? http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 183: EvalBuildRow does this call get replaced by codegen? doesn't seem so given my comment in partitioned-hash-join.cc. Line 186: for (int i = 0; i < build_expr_ctxs_.size(); ++i) { not this patch, but this loop looks like a codegen candidate. Line 868: builder.CreateCondBr(need_eval, need_eval_block, skip_eval_block); why do we need this cmp/br? can't we know at codegen time whether we're using lazy expr evaluation or not? i.e. isn't it already the case that if eval_row_fn != NULL then at runtime we'll be using lazy evaluation? http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 250: functions what other functions is this comment talking about? Line 294: expr_values_row_field_idx > all caps since it's a constant. Is there no way to do this based on the field name and the types? this is fragile. Line 407: the http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 310: get see my comment in the header file. if you make this hash_values_.data(), the loop code would be equivalent. If you do that, how about reverting the RowIdx parameter change, and then doing the .data() on that here too. Line 312: RowBatch* prefetch_batch = batch; why this renaming? Line 314: ! easier to read without the ! and just reverse the if-else. Line 318: Evaluate the effective prefetch batch size what is this saying? Figure out the optimal batch size for prefetching? Line 332: empty or the hash value mismatches, no? don't we always compare hash values before calling Equals()? Line 335: ht_ctx->SetRowToEval(row); rather than setting this in ht_ctx, why not pass it as a paramter all the way through to Equals()? That way, there's no state that needs to be reset. Does that not work out nicely in some way? Once you also do prefetching for probe and agg, won't lazy evaluation be the only way? Line 344: ht_ctx->SetRowToEval(NULL); doesn't Equals() do this? why is this needed? http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 338: (state->batch_size() + 7) / 8 BitUtil::Ceil()? Line 1606: 1 why doesn't this need updating (now that their's a call inside Equals())? http://gerrit.cloudera.org:8080/#/c/2896/2/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 478: is an array containing Line 510: scoped_ptr I don't think this works correct. Doesn't it need to be scoped_array<> so that delete[] is called? But in any case, why not make it a vector. It should be the same level of indirection (once), and if for some reason the generated code is not as good, you can still get the array using data(). Line 513: scoped_ptr same. also, is there a good reason to not use Bitmap class? -- To view, visit http://gerrit.cloudera.org:8080/2896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
