[
https://issues.apache.org/jira/browse/KUDU-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869768#comment-16869768
]
Andrew Wong commented on KUDU-2852:
-----------------------------------
To orient folks a bit with respect to the rest of the scan path, when the
CFileSet::Iterator (in tablet/cfile_set.cc) iterates over columns and calls
CFileIterator::Scan() (cfile/cfile_reader.cc). Therein, if the column's
encoding supports decoder-level evaluation, the iterator will call
CopyNextAndEval(), pushing down the predicate, and filling in the selection
vector and the column block based on what it finds. If decoder-level evaluation
isn't supported by the decoder, we just call CopyNextValues(), materializing
the unencoded data, and then evaluate the predicate on the entire block
(evaluation found in MaterializingIterator::MaterializeBlock() in
common/generic_iterators.cc).
Implementing this would be a matter of implementing
BlockDecoder::CopyNextAndEval() (defined in cfile/block_encodings.h) for e.g.
the RLE decoder (cfile/rle_block.h). For RLE, evaluation could be pushed down
by using GetNextRun() in CopyNextAndEval() instead of Get() that we use now for
CopyNextValues(). An example implementation can be found for dictionary
encoding at cfile/binary_dict_block.cc.
> Push predicate evaluation into more CFile decoders
> --------------------------------------------------
>
> Key: KUDU-2852
> URL: https://issues.apache.org/jira/browse/KUDU-2852
> Project: Kudu
> Issue Type: Improvement
> Components: cfile, perf
> Reporter: Andrew Wong
> Assignee: Mitch Barnett
> Priority: Major
> Labels: newbie
>
> Commit c0f3727 added an optimization to push predicate evaluation into the
> CFile decoders without fully materializing the contents of each cblock. It
> did this with dictionary-encoded blocks, but the optimization can be applied
> to any other encoding types too.
> A low hanging fruit is RLE decoders, which should be able to evaluate the
> predicate for each run instead of materializing each cell and then applying
> the predicate.
> KUDU-736 also notes that we may be able to apply some predicates on
> bitshuffled data.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)