Michael Ho has posted comments on this change.

Change subject: IMPALA-3286: Prefetching for PHJ probing.
......................................................................


Patch Set 4:

(91 comments)

http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 126: int32_t
> be careful when you merge with Tim's recent patch that fixes this to uint32
Done


Line 138:     /// TODO: figure out which hash function to use. We need to 
generate uncorrelated
> This comment seems outdated. Let's remove it.
Done


Line 231: ;
> delete
Done


Line 241: state->batch_size()
> this is where the comment about finding a better maximum prefetch group siz
Done


Line 242: expr_values_bytes_per_row_
> Do we need to round up?
We should round down to be within the budget.


Line 242: expr_values_bytes_per_row_
> Maybe we should DCHECK that this is > 0?
Done


Line 281: num_build_exprs
> num_exprs?
Done here and everywhere.


Line 293:   return &cur_expr_values_null_[expr_idx];
> cur_expr_values_null_ + expr_idx
Done


Line 312: void HashTableCtx::ExprValuesCache::PrepareForRead() {
> // Remember for AtEnd().
Done


http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 140: IR_ALWAYS_INLINE
> why not ALWAYS_INLINE even for non-IR?
Done


Line 156: returning in *hash
> update indicating what the side effects are
Done


Line 185: .
> by the exec nodes.  (to make it clear that the hash table itself doesn't do
Done


Line 186: Rows
> A set of rows of a row batch
Done


Line 187: all
        :   /// the evaluated values, null byte and hash values to avoid 
re-evaluating the rows
        :   /// again during probing
> the results of expression evaluation for the rows in the prefetch set, to a
Done


Line 191: values
> expression evaluation results
Done


Line 196: evaluated
> each evaluated expression in each row.
Done


Line 202: nsertion or parsing
> for performing a write pass followed by a read pass
yes. It's possible to restart a read pass after the iterator reaches the end 
but in general, it's error prone to restart a read pass. The current use case 
only needs one write and one read pass.


Line 203: or most of the data
        :   /// structures
> delete this (see comment around SetRowNull()
Done


Line 206: pointers
> iterator
Done


Line 207: pointers
> iterator
Done


Line 208: consumed
> this doesn't quite make sense for "write" mode.
replaced with the word 'read' instead.


Line 211:   ///   row respectively.
> This comment can be removed since it seems redundant with the one for each 
Done


Line 215: .
> (e.g. if an expression value is a StringValue).
Done


Line 231: Resets
> Sets the iterator to the beginning to start a write pass by resetting
Comments updated with a shorter version.


Line 232: IR_ALWAYS_INLINE
Remove.


Line 234: Resets
> Sets the iterator to the beginning to start a read pass by resetting the po
Done


Line 235: IR_ALWAYS_INLINE
Remove.


Line 237:  
> row, by moving the pointers to the next entries  ...
Done


Line 237: pointers
> iterator
Done


Line 238: IR_ALWAYS_INLINE
Remove.


Line 256:     int IR_ALWAYS_INLINE CurIdx() const {
> make this private (see comments below).
Done


Line 260: all cached values have been consumed
> the read pass is complete.
Done


Line 265:     /// Resets the ExprValuesCache to indicate it is empty.
> I think this should say:
Actually, there is too much duplication with PrepareForWrite() that I will just 
remove PrepareForWrite() and call this function Reset(). I will rename 
PrepareFoRead() to ResetForRead().


Line 266:     void IR_ALWAYS_INLINE SetAtEnd();
> in the .cc file.
Done


Line 266:     void IR_ALWAYS_INLINE SetAtEnd();
> Where is this defined?
in hash-table.cc. Will remove IR_ALWAYS_INLINE.


Line 271:       return null_bitmap_.Get<false>(row_idx);
> to stick with the iterator interface (rather than this random access), how 
Done


Line 275:     void IR_ALWAYS_INLINE SetRowNull(int row_idx) {
> same comment
Done


Line 290: expressions
> should this be singular?
Done


Line 291: i
> might be nice to change these 'i' to 'expr_idx' for clarity here and below
Renamed.


Line 293: values
> singular?
Done


Line 294: expressions
> singular and no apostrophe
Done


Line 298: expressions'
> expression's
Done


Line 302:     friend class HashTableCtx;
> does the HashTableCtx access these private fields directly? okay to leave i
yes, it does for now during codegen and HashVariableLenRow().


Line 305: ResetPointers
> how about calling it ResetIterator() to distinguish that it doesn't reset t
Done


Line 307: memory
> in bytes?
Done


Line 310: capacity
> number of rows worth of expression evaluation state
Done


Line 314: substitution
> constant substitution
Done


Line 318: num_build_exprs_
> maybe just call it num_exprs_ since it's also used by probe exprs?
Done


Line 320:  
> current row's cached expression values.
Done


Line 324: cached nullness of the
        :     /// expressions' values of the current row
> current row's cached nullness of each expression value.
Done


Line 328: cached expressions' hash values
        :     /// of the current row.
> cached hash value of the current row's expression values.
Done


Line 332: cached expressions'
        :     /// hash values.
> cached hash values.
Done


Line 336: rows'
> rows worth of
Done


Line 340: bits
> booleans (otherwise it's too easy to think this is a bitset).
Done


Line 340: rows'
> rows worth of
Done


Line 346: rows
> rows worth of hash values.
Done


Line 354: Stores the offsets into the expression values buffers of expressions' 
results.
> Maps from expression index to byte offset in a row's expression values buff
Done


Line 358: .
> for the row.
Done


Line 364: IR_ALWAYS_INLINE
> why not ALWAYS_INLINE?
Done


Line 432:  
> expression evaluation
Done


Line 433:   /// Used by PHJ to store results of batch evaluations of rows.
> The PHJ references will probably get stale. Maybe just say "Use to store re
Done


Line 539: 'row'
> what is this?
Comment fixed.


Line 541: values in 'ht_ctx'
> this seems state
Updated.


Line 736:   /// this function.
> this could use an updated explanation
Done


http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

Line 320:     DCHECK_EQ((bucket_idx_ & ~(table_->num_buckets_ - 1)), 0);
> okay, how about a comment explaining this then:
Done


http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 254:   int idx = expr_vals_cache->CurIdx();
> delete this (see hash-table.h comments)
Done


Line 289:       uint32_t hash = expr_vals_cache->ExprValuesHash();
> FWIW I got some speedups for small hash tables in the agg by hoisting some 
Interesting. May be the branch is not very predictable ?!


Line 344:       if (LIKELY(hash_tbl != NULL)) 
hash_tbl->PrefetchBucket<true>(hash);
> should this have a prefetch_mode check?
Oops...accidentally removed when experimenting with patches to reduce codegen 
time.


Line 369: current_probe_row_ != NULL || !probe_batch_iterator.AtEnd();
> It has to handle current_probe_row_ carrying over from the previous Process
Yes, what Tim said. Such confusion will go away if we remove current_probe_row_ 
and make probe_batch_pos_ point to the current row instead of the row after the 
current row. Not happening in this patch though.


Line 378: not
        :     // empty
> 'partially full' instead of 'not empty' seems clearer.
Done


Line 380: works
> work instead of works.
Done


Line 381:     // prefetching to be effective.
> I don't understand what this comment is trying to say. first, the probe bat
As mentioned above. we may not have consumed the entire prefetch group's cached 
values as the out batch may have filled up before the cache values are all 
consumed. In that case, we won't have to re-evaluate the expressions as we will 
just feed off the remaining items in the prefetch group. In that case, there 
aren't enough cycles of 'work' (i.e. evaluating the expressions and hashing) 
for the prefetched items to trickle in. 

I rephrased the comment a bit (but failed to shorten it). See if it makes more 
sense now.


Line 502: batch_size
> cache_size or prefetch_size or something. batch is too overloaded.
Done


Line 509:           // processor dependent so we may need calibration at Impala 
startup time.
> the better place for this comment seems to be when we're calculating the ca
Done


Line 510:         
hash_tbl_->PrefetchBucket<false>(expr_vals_cache->ExprValuesHash());
> Tabs?
Oops...removed.


Line 523:         TupleRow* row = batch_iter.Get();
> FWIW I got some speedups for small hash tables in the agg by hoisting some 
Interesting. May be the branch is not very predictable ?!


Line 527:       }
> shouldn't we terminate this loop when we hit expr_vals_cache->AtEnd()? or d
What Tim said. Added some comments in row-batch.h.


http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 652: 
> if keeping this, please comment with something like:
Removed.


Line 912: during
> from
Comment removed.


Line 913: when probing the hash tables
> delete this - it's confusing and Equals() already tells you what you need t
Removed.


Line 913: We need to hold on to the local allocation of the
        :     // evaluated probe rows until this probe batch is done.
> This sentence is confusing because it's talking about what we need to do fo
Move the comment to the point where we free local allocations of probe side 
expressions.


Line 915: build_expr_ctxs_
> also, what about all the other ctxs QueryMaintenance frees? Instead of spec
The new patch keeps the QueryMaintenace() here and free local allocations of 
the probe expressions after getting a new probe batch. Please see the new 
comments in


Line 989: probe batch
> is this what's important? isn't it only safe once we reset the ExprValuesCa
It's safe in practice but I see your point. I was just hoping that we can free 
up some memory before fetching the next row batch but yes, it's better to reset 
ExprValuesCache first so no client can iterate through the freed memory anymore.


http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 167: build_rows
> comment for this.
Removed in the new patch.


Line 259:   /// that 'prefetch_mode' will be substituted with constants during 
codegen time.
> this comment needs updating. also it needs clarification that it only evalu
Done


Line 260: EvalAndHashProbeBatch
> Since this doesn't evaulate the whole probe batch, I think we should call i
Thanks. I picked EvalAndHashProbePrefetchGroup().


Line 615: build_rows
> can this use the array syntax to match the function decl above?
Removed in this patch.


http://gerrit.cloudera.org:8080/#/c/2959/4/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 151: FIXED_LEN_BUFFER_LIMIT
> Yeah, good point, we should just make limit default to -1 and handle the no
Yes, I picked the -1 suggestion. I think the compiler will constant substitute 
limit if it's -1.


Line 155: + limit
> Yeah, but then FIXED_LEN_BUFFER_LIMIT was what was confusing to add to row_
Yes, limit is the number of rows to iterate over. Comments added. Also switched 
to using -1 as the default for no limit specified.


http://gerrit.cloudera.org:8080/#/c/2959/4/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 197: 1
> can't we use the enum value here?
Does it actually work in .thrift file ?


http://gerrit.cloudera.org:8080/#/c/2959/4/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 128:   HT_BUCKET
> what other modes will we have? prefetch the hash table data?
Yes. That's something to investigate in the future.


-- 
To view, visit http://gerrit.cloudera.org:8080/2959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Huaisi Xu <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to