Tim Armstrong has posted comments on this change. Change subject: Optimized ReadValueBatch() for Parquet scalar column readers. ......................................................................
Patch Set 2: (4 comments) The changes make a lot of sense http://gerrit.cloudera.org:8080/#/c/2843/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 276: /// Decodes and caches the next batch of levels. Resets members associated with the cache. Is it valid to call this when the prev batch hasn't been totally consumed? Line 301: cached_levels_ Would it make sense to just have this be a constant-sized array with e.g. 1024 entries. Could save some of the plumbing of the MemPool, reduce indirection and make it more tunable. Line 854: /// It assumes a data page with remaining values is available, and that the def/rep Can we assert any of these preconditions with DCHECKs? Line 1872: if (col_reader->IsCollectionReader() || col_reader->IsBoolColumnReader()) { Maybe should have a NeedsSeeding() method? -- To view, visit http://gerrit.cloudera.org:8080/2843 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
