Tim Armstrong has posted comments on this change. Change subject: IMPALA-3286: Software prefetching for hash table build. ......................................................................
Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/2896/1//COMMIT_MSG Commit Message: Line 21: with lineitem reduces by more than half (going from 10.5s to 4.5s). Very nice! http://gerrit.cloudera.org:8080/#/c/2896/1/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 864: 12 Hmm this is the index of the field in the class? I think it's difficult to avoid hardcoding it somewhere, but I think we should make it clear in the class defintion that the index matters and needs to be updated if members are rearranged. E.g. make this a constant in the class next to that member? http://gerrit.cloudera.org:8080/#/c/2896/1/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 295: /// for matches. It's not evaluated if probing doesn't call Equals(). There seems to be some logic that if this is NULL, it won't be reevaluated - could you explain this in the comment? Line 296: TupleRow* expr_values_row_; > Thanks for the pointer to that patch. That will be very helpful. My prefere I'm ok with moving ahead with this code so long as it's not the permanent solution. http://gerrit.cloudera.org:8080/#/c/2896/1/be/src/exec/hash-table.inline.h File be/src/exec/hash-table.inline.h: Line 122: PrefetchBucket PrefetchBucketForWrite()? To make it clear the flags to __builtin_prefetch aren't quite right if you're just reading the bucket e.g. when probing. http://gerrit.cloudera.org:8080/#/c/2896/1/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 313: FOREACH_ROW(prefetch_batch, 0, row) { Prefetching 1024 rows at a time seems suboptimal. If the cache line is 64 bytes and L1 cache is 64kb there's going to be a significant number of conflict misses between buckets and with other data if you try to prefetch 1024 buckets. I'm ok with this for now since it's obviously still very beneficial since all should fit in L2 and many will fit in L1 but it might be worth a comment. I think this relates to my other comment about processing subsets of the batch. Line 329: ht_ctx->SetRowToEval(row); The logic with SetRowToEval() is kind of tricky - maybe it needs a comment to explain that it needs to be done before Insert() http://gerrit.cloudera.org:8080/#/c/2896/1/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 337: hash_values_.reset(new uint32_t[state->batch_size()]); > I tried implementing that too but that's a nested loop inside the FOREACH_R 1024 should be the default max. There are some places where larger batches are created but they shouldn't be passed between operators. Also I think this batch is actually sized by this operator so we could apply a max there and keep that local to this node. I believe Mostafa had looked at increasing the row batch size and it has benefits for many queries. I don't know if anyone actually changes the batch size in production. -- To view, visit http://gerrit.cloudera.org:8080/2896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
