Michael Ho has posted comments on this change. Change subject: Impala-3286: Prefetching For Phj Probing. ......................................................................
Patch Set 3: (53 comments) Thanks a lot for the comments. http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 172: non_var_len = values_buffer_.var_result_begin(); > it's confusing to switch polarities like this. either call the local variab Done Line 178: bytes > don't use bytes here, just like mentioned in hash-table.h Renamed to exprs_nullness. Use uint8_t instead of bool as sizeof(bool) is implementation dependent in C++ standard. http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/hash-table.h File be/src/exec/hash-table.h: Line 169: Last > last no longer makes sense because it's not the last row that was processed Done Line 174: Last > same comment Done Line 211: ValuesBuffer > How about ExprValuesCache? In the comments, you already refer to it as a c Done Line 228: Move the pointers to point the next entries in the arrays of cached values > Advance the pointers to the next row's cached state. Done Line 258: IsRowNull > IsRowIgnored()? Please see replies below about null_bitmap_. Line 269: ExprHashValue > ExprValuesHash (i.e. it's the hash over the set of values for the row). Done Line 272: SetExprHashValue > SetExprValuesHash Done Line 275: ExprValuesBufferLoc > ExprValuePtr(int expr_idx). not plural since it gives back one expr value, Done Line 278: ExprValuesNullByteLoc > ExprValueNullPtr Done Line 290: results_buffer_size_ > how about calling this: expr_values_bytes_per_row_? Done Line 293: build_expr_size_ > num_build_exprs_ Done Line 300: bool is_read_; > Rather than having different modes (which then requires a branch at every N Good point. Done. Line 304: expr_values_buffer_ > how about cur_expr_values_, to signal that this is the current values (i.e. Done Line 308: expr_values_null_byte_ > cur_expr_values_null_ Done Line 312: expr_hash_value_ > cur_expr_values_hash_ (i.e. it's the hash for the current row's expr_values Done Line 316: expr_values_buffer_array_ > expr_values_array_ Done Line 321: uint8_t > why not make this an array of bool to be clear that it's one value per, rat There is some comments at expr_values_null_bits_ in hash-table.h about compatibility with LLVM so uint8_t was used instead of bool, Googling suggests that sizeof(bool) is implementation defined so I guess it's safer to use uint8_t to guarantee each entry is 1 byte. The generated code kind of assumes that. Line 321: expr_values_null_byte_array_ > I've seen some comments elsewhere that say something about uint8_t better t Yes, there is some comment about it in the old code for the field expr_values_null_bits_. Line 321: expr_values_null_byte_array_ > expr_values_null_bool_array_ Renamed to expr_values_null_array_ Line 324: expr_hash_value_array_ > expr_values_hash_array_ for consistency Done Line 328: null_bitmap_ > this is kinda misleading since it doesn't really tell you if the row was nu Yes, I used a similar name in a previous iterator of this patch but then I realized that the row is not really ignored as certain join types may still include null row (e.g. left anti join) as output. Comments updated. Keep the same name for the field as it seems to fit more. Line 331: /// be a build row (during Insert()) or probe row (during FindProbeRow()). > this comment is misplaced Comment fixed. Line 332: expr_values_buffer_offsets_ > expr_values_offsets_ Done Line 337: var_result_begin_ > var_result_offset_ (to be consistent with expr_values_offsets_ since this i Done Line 389: void IR_ALWAYS_INLINE ResetPointers(); > seems like this should be deleted Done Line 392: MAX_EXPR_VALUES_ARRAY_SIZE > move to ExprValuesCache? Done Line 487: }; > if these are the values for the query option, you should define this in thr Done http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/hash-table.inline.h File be/src/exec/hash-table.inline.h: Line 41: results_buffer_size_ > how about calling this: expr_values_bytes_per_row_ Done Line 98: uint32_t hash = ht_ctx->values_buffer()->ExprHashValue(); > might as well move this into Probe() and remove from all the callers. There is a caller of Probe() (ResizeHashTables()) which passes ht_ctx as NULL :-(. Line 326: DCHECK_EQ((bucket_idx_ & ~(table_->num_buckets_ - 1)), 0); > why not move this DCHECK into PrefetchBucket()? The reason is that PrefetchBucket() derives the bucket_idx by masking out the upper bits of hash values. Here we are passing the bucket_idx as hash values but this should essentially be the same as we only care about the bottom bits. That's what this DCHECK is here to verify. http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node-ir.cc File be/src/exec/partitioned-hash-join-node-ir.cc: Line 265: vals_buf->IsRowNull(idx)) > Would it make sense to make this part of the iterator? Either by storing th I did that in the previous version but I wanted to avoid storing a row index in the iterator as it's redundant. I believe iterating with an index may generate slightly better code. Line 268: // 1. No build rows -> Return this row. > Can you comment where condition 1 is handled. It seems to be missing below. It's actually in the 'non_emtpy_build_' check above. Let me add a comment. Line 282: BTS > 'null_probe_rows_' instead of BTS would be clearer Done Line 283: DCHECK(!status->ok()); > I think it would be clearer to return early in the error case, since we don Done. Line 306: DCHECK(skip_row || !status->ok()); > The error-case control flow is very mixed up with the non-error control flo Done Line 318: // Finished this batch. > Comment is redundant (implied by the AtEnd() condition). Maybe "At end of b Done Line 347: ++i; > in the other loop you used ->CurIdx() instead. Oh right. fixed. Line 377: // Evaluate and hash more rows if ht_ctx is empty. > This is a little weird. It seems like if we have a half-finished prefetch Yes, without the row evaluation and hashing, it doesn't seem to get enough other work interleaved with prefetching so it's a bit of wasted effort to prefetch in this case. Comments added. Line 398: process > "to process in the current batch." Done Line 417: int PartitionedHashJoinNode::ProcessProbeBatch( > Tangential comment: this function is only used when interpreting and doesn' Tried moving it out of this file but had trouble linking impalad. It may be easier if we move the template function above into a header file and include that in both files. I will refrain from doing that in this change but very good point though. Line 455: BufferedTupleStream > Not sure if you'll keep this argument, but you can make it: Done http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 653: BufferedTupleStream* build_rows[PARTITION_FANOUT]; > why do we do this? For slightly tighter cache footprint. The partition objects are probably large enough for all 16 of them to fit in a single cache line. Line 986: RETURN_IF_ERROR(QueryMaintenance(state)); > don't we still need to free build side local allocations more often because Yes, the new patch will free local allocations above. Just didn't get to it in this patch. http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 166: BufferedTupleStream** build_rows > what is this? Please see replies in partition-hash-join-node.cc. Line 260: HashTableCtx* ctx, : HashTableCtx::ValuesBuffer* vals_buf > do we have to pass both? Not really. Removed. It's inlined so it probably doesn't matter. Line 273: /// Prefetches the hash table buckets of the partition which 'hash' maps to. > is that true? doesn't it just prefetch one bucket? i.e.: Removed in the new patch. Line 285: int > can't we use the enum type rather than int? Done Line 285: int prefetch_mode); > prefetch_mode first since it's an in parameter. Done Line 289: HashTableCtx* ht_ctx, Status* status, int prefetch_mode); > same, keep in params and out params together. Done http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node.inline.h File be/src/exec/partitioned-hash-join-node.inline.h: Line 29: PrepareForWrite > SetAtEnd() Done http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 151: int limit = FIXED_LEN_BUFFER_LIMIT) : > What's this about? Is this just some experimenting? That's the theoretical maximum number of rows if we only have a column of TINYINT. There may be a better upper bound defined somewhere but I couldn't find it. -- To view, visit http://gerrit.cloudera.org:8080/2959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
