Tim Armstrong has posted comments on this change. Change subject: IMPALA-3286: Prefetching for PHJ probing. ......................................................................
Patch Set 11: (12 comments) This is looking good, I think there are just a few remaining opportunities to make things simpler. http://gerrit.cloudera.org:8080/#/c/2959/11/be/src/exec/hash-table-test.cc File be/src/exec/hash-table-test.cc: Line 185: runtime_state_->InitMemTrackers(TUniqueId(), NULL, -1); Let's just do this in CreateQueryState(). Line 241: runtime_state_->query_mem_tracker() It would be better to use the test mem_tracker_ we set up. http://gerrit.cloudera.org:8080/#/c/2959/11/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 125: // TODO: use tr1::array? We can delete this TODO, I don't think it's very helpful, especially since we started using scoped_array in this class. Line 131: inline This probably doesn't need to be inline, does it? http://gerrit.cloudera.org:8080/#/c/2959/11/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 126: /// Allocate various buffers for storing expression evaluation results, hash values, We should have a HashTableCtx::Create() function that allocates and initialises similar to HashTable::Create(). Would make it a bit simpler to use. Line 128: Status Init(RuntimeState* state, int num_build_tuples, MemTracker* tracker); Also mention what 'tracker' is (it seems at least as important to document that here as in ExprValueCache). Line 130: /// Call to cleanup any resources. Mention that 'tracker' is the same 'tracker' passed to Init(). Actually.. maybe just make 'tracker' a constructor or Init() argument and store it in HashTableCtx? That's what we typically do. Line 226: resourcs resources. Line 346: Bitmap null_bitmap_; I feel like it would be simpler to just do an array of uint8_t to be consistent with the other members. That avoids needing to do the special index calculation here. I don't see that there would be a big benefit from the increased memory density of the Bitmap here. http://gerrit.cloudera.org:8080/#/c/2959/11/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 476: row_idices typo: row_indices Line 477: cur_row The logic around cur_row is a little hard to follow, since it is the loop variable for the outer loop but updated in the inner loop. Something like this would make the intent clearer: for (int group_start = 0; group_start < num_rows; group_start += prefetch_size) { ... int cur_row = group_start; ... } http://gerrit.cloudera.org:8080/#/c/2959/11/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 678: probe_exp_ctxs_ Typoe: 'expr_ctxs_' -- To view, visit http://gerrit.cloudera.org:8080/2959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0 Gerrit-PatchSet: 11 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
