Dan Hecht has posted comments on this change. Change subject: IMPALA-3286: Prefetching for PHJ probing. ......................................................................
Patch Set 6: (18 comments) http://gerrit.cloudera.org:8080/#/c/2959/6/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 316: values value (since "each") Line 346: value's values (plural not possessive) and could delete "buffer" http://gerrit.cloudera.org:8080/#/c/2959/6/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 373: bool has_probe_rows = current_probe_row_ != NULL || !probe_batch_iterator.AtEnd(); I find the positive conditionals easier to reason about, but if you or Tim find the current code easier, than okay to leave: // If there's no left over from the previous call and we've iterated the entire row batch, then we're done with this batch. bool batch_done = current_probe_row == NULL && probe_batch_iterator.AtEnd(); while (!batch_done ...) { ... batch_done = current_probe_row == NULL; } Line 383: Prefetching if enabled Prefetching, if enabled, Line 387: In this case, prefetching may not be effective as there aren't : // enough cycles for prefetched items to trickle in. this sentence is still confusing. I think what you mean is that we shouldn't bother to prefetch in this case, but it makes it sound like we do prefetch but it might not be effective. I think just delete this last sentence. Line 410: has_probe_rows = current_probe_row_ != NULL; maybe add: if (!has_probe_rows) DCHECK(probe_batch_iterator.AtEnd()); that implication has to hold, correct? Line 413: probe_batch_->num_tuples_per_row(); just to check my understanding, if we're AtEnd(), then probe_batch_pos_ = batch->num_rows_, since Get() will be one past the last? is that right? Line 475: cur_row how did the old code work? Line 494: if (UNLIKELY(!hash_tbl_->Insert(ht_ctx, row_idx, row))) { can combine these two if-stmts http://gerrit.cloudera.org:8080/#/c/2959/6/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 677: // 'probe_exp_ctxs_' should have made no local allocations in this function. is there a simple DCHECK we could write to verify that? Line 1032: DCHECK(probe_batch_pos_ == -1 || ht_ctx_->expr_values_cache()->AtEnd()); why can't this just be: DCHECK(ht_ctx_->expr_values_cache()->AEnd()); ? which is what the comment says. Line 1754: DCHECK(process_probe_batch_fn->getLinkage() == GlobalValue::WeakODRLinkage) out of curiosity why does this end up weak now but not before? http://gerrit.cloudera.org:8080/#/c/2959/6/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 252: probing probe Line 254: in Line 255: free space doesn't this always reset the cache? "free space" seems to imply it won't but instead just uses what's remaining. maybe say "depends on the capacity of ..." Line 579: Document the return value. Line 579: constant. a constant http://gerrit.cloudera.org:8080/#/c/2959/6/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 409: . Will iterate up till the number : /// of rows in the given row batch if '_start_row_idx + _limit' is beyond the last row. I was just being dumb when I didn't notice how limit works. I think you should just delete this sentence like you had it before. -- To view, visit http://gerrit.cloudera.org:8080/2959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0 Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
