Tim Armstrong has posted comments on this change.

Change subject: PREVIEW: Basic column-wise slot materialization in Parquet 
scanner.
......................................................................


Patch Set 1:

(3 comments)

Is the long-term plan to keep the scratch batch in the row-wise format? It 
seems like this should work ok cache-wise (batch should fit in cache, memory 
access pattern will have gaps but a regular stride), but having the values 
densely packed would allow some optimisations down the road. I suspect it would 
be slightly faster in the short term but I don't know if it would have a long 
term impact.

http://gerrit.cloudera.org:8080/#/c/2779/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1732:   // and return an output batch with relatively few rows.
The TODO describes the current intended behaviour, so that sounds right. I 
think sending small batches up the tree is ok for selective scans.


Line 1737:       // Optimization for scans with selective filters/conjuncts: 
None of the
Is this factoring in accumulated disk buffers?


Line 1829: ReadValueBatch
Ignoring return value?

I think we need to be careful about propagating errors, since I think it could 
end badly if there's a read error and we try to evaluate conjuncts or filters 
over bogus data.

The existing code avoids this by checking for errors every row.


-- 
To view, visit http://gerrit.cloudera.org:8080/2779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to