Tim Armstrong has posted comments on this change. Change subject: IMPALA-2736: Basic column-wise slot materialization in Parquet scanner. ......................................................................
Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/2779/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1729: int HdfsParquetScanner::TransferScratchTuples(ScratchTupleBatch* scratch_batch) { > Thanks for the suggestions, Tim. I rewrote this function to be more perform I think the new version is just as readable, and it's way easier to reason about performance, so thumbs up from me. I'm not in favour of optimising to death, but I think it's good to write hot loops in a way that it's somewhat feasible to understand the performance characteristics of what is actually going to execute on the CPU. http://gerrit.cloudera.org:8080/#/c/2779/3/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: Line 249: // significantly better than UNLIKELY(literal_count_ == 0 && repeat_count_ == 0) > Correct. I'm already working on batch-reading and caching the def/rep level Nice. Line 250: if (repeat_count_ == 0) { > Actually Mostafa tried (repeat_count_ & literal_count_) == 0 and it was sti You mean (repeat_count_ | literal_count_) ? I'm pretty sure & is incorrect there, since it's always false if either is 0. Anyway, I think you have bigger fish to fry than this :) -- To view, visit http://gerrit.cloudera.org:8080/2779 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
