Michael Ho has posted comments on this change. Change subject: IMPALA-3286: Software prefetching for hash table build. ......................................................................
Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/2896/1/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 296: TupleRow* expr_values_row_; > I personally feel like the original design where the buffers are embedded i Thanks for the pointer to that patch. That will be very helpful. My preference would be to get this code in for 2.6 soon and do a separate clean up (as in brushing up your patch which also seems a bit non-trivial). Your patch may be helpful as a follow-up if we want to try the idea of saving the values of build expression evaluation computed during prefetching. http://gerrit.cloudera.org:8080/#/c/2896/1/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 337: hash_values_.reset(new uint32_t[state->batch_size()]); > Even if the input batch is big, could you just process a subset of it at a I tried implementing that too but that's a nested loop inside the FOREACH_ROW iterator. The code looks rather complicated but it's doable. If we can set a reasonable max size for a row batch when it's created, things may be easier. Is 1024 a reasonable size ? I know it may not make sense for rows with say a single tinyint but is it a reasonable number for the common case ? -- To view, visit http://gerrit.cloudera.org:8080/2896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
