Tim Armstrong has posted comments on this change. Change subject: Optimized ReadValueBatch() for Parquet scalar column readers. ......................................................................
Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/2843/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 301: uint8_t* cached_levels_; > Good question. Had thought about it and opted for the safer solution. For l In general, it's best not to add more untracked memory. It's only 1024 bytes per column, which is small compared to other overhead like dictionaries. So if there's a perf benefit it's probably ok. I'm ok with the MemPool approach too. Line 1872: if (col_reader->IsCollectionReader() || col_reader->IsBoolColumnReader()) { > I added one, but not sure if it's clearer/better. I think it's better not having this code know about the specifics of all the column reader types, even if it's still ugly. -- To view, visit http://gerrit.cloudera.org:8080/2843 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
