Tim Armstrong has posted comments on this change. Change subject: PREVIEW: Basic column-wise slot materialization in Parquet scanner. ......................................................................
Patch Set 1: (3 comments) Is the long-term plan to keep the scratch batch in the row-wise format? It seems like this should work ok cache-wise (batch should fit in cache, memory access pattern will have gaps but a regular stride), but having the values densely packed would allow some optimisations down the road. I suspect it would be slightly faster in the short term but I don't know if it would have a long term impact. http://gerrit.cloudera.org:8080/#/c/2779/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1732: // and return an output batch with relatively few rows. The TODO describes the current intended behaviour, so that sounds right. I think sending small batches up the tree is ok for selective scans. Line 1737: // Optimization for scans with selective filters/conjuncts: None of the Is this factoring in accumulated disk buffers? Line 1829: ReadValueBatch Ignoring return value? I think we need to be careful about propagating errors, since I think it could end badly if there's a read error and we try to evaluate conjuncts or filters over bogus data. The existing code avoids this by checking for errors every row. -- To view, visit http://gerrit.cloudera.org:8080/2779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
