Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet 
scanner.
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/2779/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 706:     while (LIKELY(val_count < max_values && !RowGroupAtEnd() && 
continue_execution)) {
> A row group consists of multiple data blocks. A data block contains metadat
Hm, ok, fair enough, seems like it's trickier than I initially thought.


Line 1752:     if (UNLIKELY(
> Sorry, I'm not following you suggestion. Can you clarify? There's a modulo 
I think the check is meant to occur less than once per batch in the 
AssembleRows() loop, so it could be done in that loop.

The modulo logic would have to change a bit to track how many rows were 
processed since last check. It probably isn't high-priority to fix this but I 
feel like it makes it harder to reason performance of some of the hot loops 
with this extrastuffin here..


Line 1912:           ReadCollectionItem(column_readers, materialize_tuple, 
pool, tuple);
> That's a fair point, so let's discuss it. I had decided against applying th
Ah ok, I hadn't thought about the additional scratch batches, I think because I 
wasn't thinking about conjuncts on nested collections. Maybe add a comment to 
explain why it's materialised row-wise?


-- 
To view, visit http://gerrit.cloudera.org:8080/2779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to