Michael Ho has posted comments on this change.

Change subject: Impala-3286: Prefetching For Phj Probing.
......................................................................


Patch Set 3:

(53 comments)

Thanks a lot for the comments.

http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 172: non_var_len = values_buffer_.var_result_begin();
> it's confusing to switch polarities like this. either call the local variab
Done


Line 178: bytes
> don't use bytes here, just like mentioned in hash-table.h
Renamed to exprs_nullness. Use uint8_t instead of bool as sizeof(bool) is 
implementation dependent in C++ standard.


http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 169: Last
> last no longer makes sense because it's not the last row that was processed
Done


Line 174: Last
> same comment
Done


Line 211: ValuesBuffer
> How about ExprValuesCache?  In the comments, you already refer to it as a c
Done


Line 228: Move the pointers to point the next entries in the arrays of cached 
values
> Advance the pointers to the next row's cached state.
Done


Line 258: IsRowNull
> IsRowIgnored()?
Please see replies below about null_bitmap_.


Line 269: ExprHashValue
> ExprValuesHash (i.e. it's the hash over the set of values for the row).
Done


Line 272: SetExprHashValue
> SetExprValuesHash
Done


Line 275: ExprValuesBufferLoc
> ExprValuePtr(int expr_idx). not plural since it gives back one expr value, 
Done


Line 278: ExprValuesNullByteLoc
> ExprValueNullPtr
Done


Line 290: results_buffer_size_
> how about calling this: expr_values_bytes_per_row_?
Done


Line 293: build_expr_size_
> num_build_exprs_
Done


Line 300:     bool is_read_;
> Rather than having different modes (which then requires a branch at every N
Good point. Done.


Line 304: expr_values_buffer_
> how about cur_expr_values_, to signal that this is the current values (i.e.
Done


Line 308: expr_values_null_byte_
> cur_expr_values_null_
Done


Line 312: expr_hash_value_
> cur_expr_values_hash_ (i.e. it's the hash for the current row's expr_values
Done


Line 316: expr_values_buffer_array_
> expr_values_array_
Done


Line 321: uint8_t
> why not make this an array of bool to be clear that it's one value per, rat
There is some comments at expr_values_null_bits_ in hash-table.h about 
compatibility with LLVM so uint8_t was used instead of bool, Googling suggests 
that sizeof(bool) is implementation defined so I guess it's safer to use 
uint8_t to guarantee each entry is 1 byte. The generated code kind of assumes 
that.


Line 321: expr_values_null_byte_array_
> I've seen some comments elsewhere that say something about uint8_t better t
Yes, there is some comment about it in the old code for the field 
expr_values_null_bits_.


Line 321: expr_values_null_byte_array_
> expr_values_null_bool_array_
Renamed to expr_values_null_array_


Line 324: expr_hash_value_array_
> expr_values_hash_array_ for consistency
Done


Line 328: null_bitmap_
> this is kinda misleading since it doesn't really tell you if the row was nu
Yes, I used a similar name in a previous iterator of this patch but then I 
realized that the row is not really ignored as certain join types may still 
include null row (e.g. left anti join) as output. Comments updated. Keep the 
same name for the field as it seems to fit more.


Line 331:     /// be a build row (during Insert()) or probe row (during 
FindProbeRow()).
> this comment is misplaced
Comment fixed.


Line 332: expr_values_buffer_offsets_
> expr_values_offsets_
Done


Line 337: var_result_begin_
> var_result_offset_ (to be consistent with expr_values_offsets_ since this i
Done


Line 389:   void IR_ALWAYS_INLINE ResetPointers();
> seems like this should be deleted
Done


Line 392: MAX_EXPR_VALUES_ARRAY_SIZE
> move to ExprValuesCache?
Done


Line 487:   };
> if these are the values for the query option, you should define this in thr
Done


http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

Line 41: results_buffer_size_
> how about calling this: expr_values_bytes_per_row_
Done


Line 98:   uint32_t hash = ht_ctx->values_buffer()->ExprHashValue();
> might as well move this into Probe() and remove from all the callers.
There is a caller of Probe() (ResizeHashTables()) which passes ht_ctx as NULL 
:-(.


Line 326:     DCHECK_EQ((bucket_idx_ & ~(table_->num_buckets_ - 1)), 0);
> why not move this DCHECK into PrefetchBucket()?
The reason is that PrefetchBucket() derives the bucket_idx by masking out the 
upper bits of hash values. Here we are passing the bucket_idx as hash values 
but this should essentially be the same as we only care about the bottom bits. 
That's what this DCHECK is here to verify.


http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 265: vals_buf->IsRowNull(idx))
> Would it make sense to make this part of the iterator? Either by storing th
I did that in the previous version but I wanted to avoid storing a row index in 
the iterator as it's redundant. I believe iterating with an index may generate 
slightly better code.


Line 268:         // 1. No build rows -> Return this row.
> Can you comment where condition 1 is handled. It seems to be missing below.
It's actually in the 'non_emtpy_build_' check above. Let me add a comment.


Line 282: BTS
> 'null_probe_rows_' instead of BTS would be clearer
Done


Line 283:           DCHECK(!status->ok());
> I think it would be clearer to return early in the error case, since we don
Done.


Line 306:           DCHECK(skip_row || !status->ok());
> The error-case control flow is very mixed up with the non-error control flo
Done


Line 318:     // Finished this batch.
> Comment is redundant (implied by the AtEnd() condition). Maybe "At end of b
Done


Line 347:     ++i;
> in the other loop you used ->CurIdx() instead.
Oh right. fixed.


Line 377:     // Evaluate and hash more rows if ht_ctx is empty.
> This is a little weird. It seems like if we  have a half-finished prefetch 
Yes, without the row evaluation and hashing, it doesn't seem to get enough 
other work interleaved with prefetching so it's a bit of wasted effort to 
prefetch in this case. Comments added.


Line 398: process
> "to process in the current batch."
Done


Line 417: int PartitionedHashJoinNode::ProcessProbeBatch(
> Tangential comment: this function is only used when interpreting and doesn'
Tried moving it out of this file but had trouble linking impalad. It may be 
easier if we move the template function above into a header file and include 
that in both files. I will refrain from doing that in this change but very good 
point though.


Line 455: BufferedTupleStream
> Not sure if you'll keep this argument, but you can make it:
Done


http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 653:   BufferedTupleStream* build_rows[PARTITION_FANOUT];
> why do we do this?
For slightly tighter cache footprint. The partition objects are probably large 
enough for all 16 of them to fit in a single cache line.


Line 986:     RETURN_IF_ERROR(QueryMaintenance(state));
> don't we still need to free build side local allocations more often because
Yes, the new patch will free local allocations above. Just didn't get to it in 
this patch.


http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 166: BufferedTupleStream** build_rows
> what is this?
Please see replies in partition-hash-join-node.cc.


Line 260: HashTableCtx* ctx,
        :       HashTableCtx::ValuesBuffer* vals_buf
> do we have to pass both?
Not really. Removed. It's inlined so it probably doesn't matter.


Line 273:   /// Prefetches the hash table buckets of the partition which 'hash' 
maps to.
> is that true? doesn't it just prefetch one bucket? i.e.:
Removed in the new patch.


Line 285: int
> can't we use the enum type rather than int?
Done


Line 285:       int prefetch_mode);
> prefetch_mode first since it's an in parameter.
Done


Line 289:       HashTableCtx* ht_ctx, Status* status, int prefetch_mode);
> same, keep in params and out params together.
Done


http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/exec/partitioned-hash-join-node.inline.h
File be/src/exec/partitioned-hash-join-node.inline.h:

Line 29: PrepareForWrite
> SetAtEnd()
Done


http://gerrit.cloudera.org:8080/#/c/2959/3/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 151:         int limit = FIXED_LEN_BUFFER_LIMIT) :
> What's this about? Is this just some experimenting?
That's the theoretical maximum number of rows if we only have a column of 
TINYINT. There may be a better upper bound defined somewhere but I couldn't 
find it.


-- 
To view, visit http://gerrit.cloudera.org:8080/2959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to