Michael Ho has posted comments on this change. Change subject: IMPALA-3286: Prefetching for PHJ probing. ......................................................................
Patch Set 4: (91 comments) http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 126: int32_t > be careful when you merge with Tim's recent patch that fixes this to uint32 Done Line 138: /// TODO: figure out which hash function to use. We need to generate uncorrelated > This comment seems outdated. Let's remove it. Done Line 231: ; > delete Done Line 241: state->batch_size() > this is where the comment about finding a better maximum prefetch group siz Done Line 242: expr_values_bytes_per_row_ > Do we need to round up? We should round down to be within the budget. Line 242: expr_values_bytes_per_row_ > Maybe we should DCHECK that this is > 0? Done Line 281: num_build_exprs > num_exprs? Done here and everywhere. Line 293: return &cur_expr_values_null_[expr_idx]; > cur_expr_values_null_ + expr_idx Done Line 312: void HashTableCtx::ExprValuesCache::PrepareForRead() { > // Remember for AtEnd(). Done http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 140: IR_ALWAYS_INLINE > why not ALWAYS_INLINE even for non-IR? Done Line 156: returning in *hash > update indicating what the side effects are Done Line 185: . > by the exec nodes. (to make it clear that the hash table itself doesn't do Done Line 186: Rows > A set of rows of a row batch Done Line 187: all : /// the evaluated values, null byte and hash values to avoid re-evaluating the rows : /// again during probing > the results of expression evaluation for the rows in the prefetch set, to a Done Line 191: values > expression evaluation results Done Line 196: evaluated > each evaluated expression in each row. Done Line 202: nsertion or parsing > for performing a write pass followed by a read pass yes. It's possible to restart a read pass after the iterator reaches the end but in general, it's error prone to restart a read pass. The current use case only needs one write and one read pass. Line 203: or most of the data : /// structures > delete this (see comment around SetRowNull() Done Line 206: pointers > iterator Done Line 207: pointers > iterator Done Line 208: consumed > this doesn't quite make sense for "write" mode. replaced with the word 'read' instead. Line 211: /// row respectively. > This comment can be removed since it seems redundant with the one for each Done Line 215: . > (e.g. if an expression value is a StringValue). Done Line 231: Resets > Sets the iterator to the beginning to start a write pass by resetting Comments updated with a shorter version. Line 232: IR_ALWAYS_INLINE Remove. Line 234: Resets > Sets the iterator to the beginning to start a read pass by resetting the po Done Line 235: IR_ALWAYS_INLINE Remove. Line 237: > row, by moving the pointers to the next entries ... Done Line 237: pointers > iterator Done Line 238: IR_ALWAYS_INLINE Remove. Line 256: int IR_ALWAYS_INLINE CurIdx() const { > make this private (see comments below). Done Line 260: all cached values have been consumed > the read pass is complete. Done Line 265: /// Resets the ExprValuesCache to indicate it is empty. > I think this should say: Actually, there is too much duplication with PrepareForWrite() that I will just remove PrepareForWrite() and call this function Reset(). I will rename PrepareFoRead() to ResetForRead(). Line 266: void IR_ALWAYS_INLINE SetAtEnd(); > in the .cc file. Done Line 266: void IR_ALWAYS_INLINE SetAtEnd(); > Where is this defined? in hash-table.cc. Will remove IR_ALWAYS_INLINE. Line 271: return null_bitmap_.Get<false>(row_idx); > to stick with the iterator interface (rather than this random access), how Done Line 275: void IR_ALWAYS_INLINE SetRowNull(int row_idx) { > same comment Done Line 290: expressions > should this be singular? Done Line 291: i > might be nice to change these 'i' to 'expr_idx' for clarity here and below Renamed. Line 293: values > singular? Done Line 294: expressions > singular and no apostrophe Done Line 298: expressions' > expression's Done Line 302: friend class HashTableCtx; > does the HashTableCtx access these private fields directly? okay to leave i yes, it does for now during codegen and HashVariableLenRow(). Line 305: ResetPointers > how about calling it ResetIterator() to distinguish that it doesn't reset t Done Line 307: memory > in bytes? Done Line 310: capacity > number of rows worth of expression evaluation state Done Line 314: substitution > constant substitution Done Line 318: num_build_exprs_ > maybe just call it num_exprs_ since it's also used by probe exprs? Done Line 320: > current row's cached expression values. Done Line 324: cached nullness of the : /// expressions' values of the current row > current row's cached nullness of each expression value. Done Line 328: cached expressions' hash values : /// of the current row. > cached hash value of the current row's expression values. Done Line 332: cached expressions' : /// hash values. > cached hash values. Done Line 336: rows' > rows worth of Done Line 340: bits > booleans (otherwise it's too easy to think this is a bitset). Done Line 340: rows' > rows worth of Done Line 346: rows > rows worth of hash values. Done Line 354: Stores the offsets into the expression values buffers of expressions' results. > Maps from expression index to byte offset in a row's expression values buff Done Line 358: . > for the row. Done Line 364: IR_ALWAYS_INLINE > why not ALWAYS_INLINE? Done Line 432: > expression evaluation Done Line 433: /// Used by PHJ to store results of batch evaluations of rows. > The PHJ references will probably get stale. Maybe just say "Use to store re Done Line 539: 'row' > what is this? Comment fixed. Line 541: values in 'ht_ctx' > this seems state Updated. Line 736: /// this function. > this could use an updated explanation Done http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/hash-table.inline.h File be/src/exec/hash-table.inline.h: Line 320: DCHECK_EQ((bucket_idx_ & ~(table_->num_buckets_ - 1)), 0); > okay, how about a comment explaining this then: Done http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 254: int idx = expr_vals_cache->CurIdx(); > delete this (see hash-table.h comments) Done Line 289: uint32_t hash = expr_vals_cache->ExprValuesHash(); > FWIW I got some speedups for small hash tables in the agg by hoisting some Interesting. May be the branch is not very predictable ?! Line 344: if (LIKELY(hash_tbl != NULL)) hash_tbl->PrefetchBucket<true>(hash); > should this have a prefetch_mode check? Oops...accidentally removed when experimenting with patches to reduce codegen time. Line 369: current_probe_row_ != NULL || !probe_batch_iterator.AtEnd(); > It has to handle current_probe_row_ carrying over from the previous Process Yes, what Tim said. Such confusion will go away if we remove current_probe_row_ and make probe_batch_pos_ point to the current row instead of the row after the current row. Not happening in this patch though. Line 378: not : // empty > 'partially full' instead of 'not empty' seems clearer. Done Line 380: works > work instead of works. Done Line 381: // prefetching to be effective. > I don't understand what this comment is trying to say. first, the probe bat As mentioned above. we may not have consumed the entire prefetch group's cached values as the out batch may have filled up before the cache values are all consumed. In that case, we won't have to re-evaluate the expressions as we will just feed off the remaining items in the prefetch group. In that case, there aren't enough cycles of 'work' (i.e. evaluating the expressions and hashing) for the prefetched items to trickle in. I rephrased the comment a bit (but failed to shorten it). See if it makes more sense now. Line 502: batch_size > cache_size or prefetch_size or something. batch is too overloaded. Done Line 509: // processor dependent so we may need calibration at Impala startup time. > the better place for this comment seems to be when we're calculating the ca Done Line 510: hash_tbl_->PrefetchBucket<false>(expr_vals_cache->ExprValuesHash()); > Tabs? Oops...removed. Line 523: TupleRow* row = batch_iter.Get(); > FWIW I got some speedups for small hash tables in the agg by hoisting some Interesting. May be the branch is not very predictable ?! Line 527: } > shouldn't we terminate this loop when we hit expr_vals_cache->AtEnd()? or d What Tim said. Added some comments in row-batch.h. http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 652: > if keeping this, please comment with something like: Removed. Line 912: during > from Comment removed. Line 913: when probing the hash tables > delete this - it's confusing and Equals() already tells you what you need t Removed. Line 913: We need to hold on to the local allocation of the : // evaluated probe rows until this probe batch is done. > This sentence is confusing because it's talking about what we need to do fo Move the comment to the point where we free local allocations of probe side expressions. Line 915: build_expr_ctxs_ > also, what about all the other ctxs QueryMaintenance frees? Instead of spec The new patch keeps the QueryMaintenace() here and free local allocations of the probe expressions after getting a new probe batch. Please see the new comments in Line 989: probe batch > is this what's important? isn't it only safe once we reset the ExprValuesCa It's safe in practice but I see your point. I was just hoping that we can free up some memory before fetching the next row batch but yes, it's better to reset ExprValuesCache first so no client can iterate through the freed memory anymore. http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 167: build_rows > comment for this. Removed in the new patch. Line 259: /// that 'prefetch_mode' will be substituted with constants during codegen time. > this comment needs updating. also it needs clarification that it only evalu Done Line 260: EvalAndHashProbeBatch > Since this doesn't evaulate the whole probe batch, I think we should call i Thanks. I picked EvalAndHashProbePrefetchGroup(). Line 615: build_rows > can this use the array syntax to match the function decl above? Removed in this patch. http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 151: FIXED_LEN_BUFFER_LIMIT > Yeah, good point, we should just make limit default to -1 and handle the no Yes, I picked the -1 suggestion. I think the compiler will constant substitute limit if it's -1. Line 155: + limit > Yeah, but then FIXED_LEN_BUFFER_LIMIT was what was confusing to add to row_ Yes, limit is the number of rows to iterate over. Comments added. Also switched to using -1 as the default for no limit specified. http://gerrit.cloudera.org:8080/#/c/2959/4/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 197: 1 > can't we use the enum value here? Does it actually work in .thrift file ? http://gerrit.cloudera.org:8080/#/c/2959/4/common/thrift/Types.thrift File common/thrift/Types.thrift: Line 128: HT_BUCKET > what other modes will we have? prefetch the hash table data? Yes. That's something to investigate in the future. -- To view, visit http://gerrit.cloudera.org:8080/2959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
