Alex Behm has posted comments on this change. Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet scanner. ......................................................................
Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/2779/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 706: while (LIKELY(val_count < max_values && !RowGroupAtEnd() && continue_execution)) { > Hm, ok, fair enough, seems like it's trickier than I initially thought. Actually, we do have the number of top-level values in a row group. But using those would mean we'd have to specialize the implementation of this function based on IN_COLLECTION, i.e., if we are in a collection we'd still have to revert to this version of the code. Also, we might not be able to check for the same error conditions as before (e.g. did we read more values than were promised in the row group metadata?) What do you think? Line 1752: if (UNLIKELY( > I think the check is meant to occur less than once per batch in the Assembl Makes sense. I'm a little reluctant to address the issue in this patch because it's at least not totally trivial. I added a TODO. Let me know if you are not ok with deferring. Line 1912: ReadCollectionItem(column_readers, materialize_tuple, pool, tuple); > Ah ok, I hadn't thought about the additional scratch batches, I think becau I added a blurb in the header how slots are materialized. -- To view, visit http://gerrit.cloudera.org:8080/2779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
