Alex Behm has posted comments on this change.

Change subject: Optimized ReadValueBatch() for Parquet scalar column readers.
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/2843/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 301:   uint8_t* cached_levels_;
> In general, it's best not to add more untracked memory.
Could be 2x1024 bytes per column. Agree it's small. I'm thinking we should keep 
the MemPool approach for now to be on the safe side.

I'm still investigating some other options with respect to dealing with the 
re/def levels, so we may end up with an entirely different approach. For 
example, we could use a bitmap (NULL vs. not NULL) instead of the def levels in 
some cases, or alternatively set the NULL bits directly in the scratch tuples 
instead of going through the def level cache indirection. Similar ideas can be 
applied to the other uses of def levels and rep levels.


Line 1872:       if (col_reader->IsCollectionReader() || 
col_reader->IsBoolColumnReader()) {
> I think it's better not having this code know about the specifics of all th
Makes sense and works for me :)


-- 
To view, visit http://gerrit.cloudera.org:8080/2843
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21fa9b050a45f2dd45cc0091ea5b008d3c0a3f30
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to