Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet 
scanner.
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/2779/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 195:   RowBatch batch;
The embedded RowBatch is a little funky but this seems like the best solution 
for now.


Line 207:   //~ScratchTupleBatch() { pool->FreeAll(); }
?


Line 1729: ScratchTupleBatch
Consider adding __restrict__ annotation. I suspect all of the member variable 
lookups (num_tuples, tuple_idx, etc) have to go through the pointer every 
iteration because compiler can't infer that scratch_batch doesn't alias a 
pointer you've written through in the loop (-fno-strict-aliasing means that the 
compiler can't infer that the memory is disjoint based on pointer types).

Also consider making batch_ an argument and doing the same.

Not sure if this is a performance bottleneck, but we're doing a lot of memory 
copying in this loop, so if we can reduce extraneous loads and stores it will 
probably help.


Line 1738: scan_node_->
If this loop is performance-sensitive, consider hoisting this pointer 
indirection out of the loop (compiler probably can't do it with 
-fno-strict-aliasing).


Line 1742: scanner_conjunct_ctxs_
Not your change, but we should consider just copying the scanner_conjunct_ctxs_ 
vector into the scan node and save a pointer indirection. 

When I was previously working on the scanner it seemed fairly sensitive to 
cache/TLB effects, probably because each column has a lot of state: reducing 
pointer indirections and therefore the memory footprint of the working set 
seemed to help.

We could also hoist the empty() value lookups into local variables outside of 
the loop (we still have the alias problem: the "this" pointer could have been 
written to through another pointer as far as compiler is concerned).


Line 1823:     int num_row_to_commit = TransferScratchTuples(scratch_batch);
There's an assumption here that the remaining scratch rows will fit in the 
output batch. Can we assert this with a DCHECK?


http://gerrit.cloudera.org:8080/#/c/2779/3/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

Line 249:   // significantly better than UNLIKELY(literal_count_ == 0 && 
repeat_count_ == 0)
Hopefully as this progresses we might be able to use a batched interface that 
would be have a less branchy inner loop and wouldn't be prone to these problems.


Line 250:   if (repeat_count_ == 0) {
Would (repeat_count_ + literal_count_) == 0 be better still? Not sure if this 
branch is still showing up in profiling.


-- 
To view, visit http://gerrit.cloudera.org:8080/2779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to