Tim Armstrong has posted comments on this change. Change subject: IMPALA-3286: Prefetching for PHJ probing. ......................................................................
Patch Set 2: (16 comments) I did an initial pass. I think we need to give the abstractions some more thought. I think then the code will get simpler and we'll be able to reason better about the correctness of the changes to the join algorithm. http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 127: Status HashTableCtx::Init(RuntimeState* state) { It would be kind of nice if we could just allocate all this stuff from a MemPool (not sure which MemPool though). Line 130: MemTracker* query_mem_tracker = state->query_mem_tracker(); We should account it against the exec node's MemTracker, not the query. Line 133: int expected_usage = expr_values_size + expr_values_null_bits_size + This calculation seems very error-prone and tied in with the allocations below. Is there any way to restructure it to make it easier to reason about. Line 134: BitUtil::Ceil(capacity_, 8) I think the BitMap has a function to do this calculation. Line 757: Value* expr_values_buffer_ptr = I think this works out simpler if you make the buffers external to HashTableContext, since then these values are just function arguments rather than values we have to pull out from a hardcoded address. http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 153: /// Call to cleanup any resources. Should document whether it's ok to call this if Init() fails. It's not really clear whether it is right now. Line 363: /// Pointer into 'expr_values_buffer_array_' for the cached evaluated expression As discussed, I feel pretty strongly that the expression buffering should be a separate abstraction from the hash table context. I think we should move towards the HashTableContext being stateless. I also think it would be better to avoid having implicit iterators tied in with the buffers - let the caller use raw pointers, or provide an Iterator() abstraction Line 380: indiciate indicate. Line 389: boost::scoped_ptr<Bitmap> ignored_bitmap_; Can we save an indirection here by just storing the bitmap inline? Line 453: PrefetchState PrefetchMode? State makes me thing that this is asome kind of state machine. http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/hash-table.inline.h File be/src/exec/hash-table.inline.h: Line 46: inline void HashTableCtx::PrepareForWrite() { This assumes a particular usage pattern (linear read/write passes). I think a better abstraction would let callers also do random access if they wanted to. http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 260: ht_ctx->NextRow(); I think this implicit iterator is kind of confusing to follow in here. It's really hard to see that it's advanced in sync with the probe batch iterator. Line 342: if (current_probe_row_ != NULL) { Could we factor out the ProcessProbeRow stuff into a separate function? It would make it easier to follow the high-level logic of the function. Line 386: } while (current_probe_row_ != NULL && remaining_capacity > 0 && status->ok()); It would probably be more readable as a while() loop since the condition would be closer to the initialisation up the top. http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 466: prefetch_mode Is this an enum defined somewhere? It's hard to know what this is or what the values mean. http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 251: int It looks like this comes from an enum, so we should use the enum type instead of int. -- To view, visit http://gerrit.cloudera.org:8080/2959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
