Tim Armstrong has posted comments on this change. Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet scanner. ......................................................................
Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/2779/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 706: while (LIKELY(val_count < max_values && !RowGroupAtEnd() && continue_execution)) { > A row group consists of multiple data blocks. A data block contains metadat Hm, ok, fair enough, seems like it's trickier than I initially thought. Line 1752: if (UNLIKELY( > Sorry, I'm not following you suggestion. Can you clarify? There's a modulo I think the check is meant to occur less than once per batch in the AssembleRows() loop, so it could be done in that loop. The modulo logic would have to change a bit to track how many rows were processed since last check. It probably isn't high-priority to fix this but I feel like it makes it harder to reason performance of some of the hot loops with this extrastuffin here.. Line 1912: ReadCollectionItem(column_readers, materialize_tuple, pool, tuple); > That's a fair point, so let's discuss it. I had decided against applying th Ah ok, I hadn't thought about the additional scratch batches, I think because I wasn't thinking about conjuncts on nested collections. Maybe add a comment to explain why it's materialised row-wise? -- To view, visit http://gerrit.cloudera.org:8080/2779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
