[ 
https://issues.apache.org/jira/browse/KUDU-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869768#comment-16869768
 ] 

Andrew Wong commented on KUDU-2852:
-----------------------------------

To orient folks a bit with respect to the rest of the scan path, when the 
CFileSet::Iterator (in tablet/cfile_set.cc) iterates over columns and calls 
CFileIterator::Scan() (cfile/cfile_reader.cc). Therein, if the column's 
encoding supports decoder-level evaluation, the iterator will call 
CopyNextAndEval(), pushing down the predicate, and filling in the selection 
vector and the column block based on what it finds. If decoder-level evaluation 
isn't supported by the decoder, we just call CopyNextValues(), materializing 
the unencoded data, and then evaluate the predicate on the entire block 
(evaluation found in MaterializingIterator::MaterializeBlock() in 
common/generic_iterators.cc).

 

Implementing this would be a matter of implementing 
BlockDecoder::CopyNextAndEval() (defined in cfile/block_encodings.h) for e.g. 
the RLE decoder (cfile/rle_block.h). For RLE, evaluation could be pushed down 
by using GetNextRun() in CopyNextAndEval() instead of Get() that we use now for 
CopyNextValues(). An example implementation can be found for dictionary 
encoding at cfile/binary_dict_block.cc.

> Push predicate evaluation into more CFile decoders
> --------------------------------------------------
>
>                 Key: KUDU-2852
>                 URL: https://issues.apache.org/jira/browse/KUDU-2852
>             Project: Kudu
>          Issue Type: Improvement
>          Components: cfile, perf
>            Reporter: Andrew Wong
>            Assignee: Mitch Barnett
>            Priority: Major
>              Labels: newbie
>
> Commit c0f3727 added an optimization to push predicate evaluation into the 
> CFile decoders without fully materializing the contents of each cblock. It 
> did this with dictionary-encoded blocks, but the optimization can be applied 
> to any other encoding types too.
> A low hanging fruit is RLE decoders, which should be able to evaluate the 
> predicate for each run instead of materializing each cell and then applying 
> the predicate.
> KUDU-736 also notes that we may be able to apply some predicates on 
> bitshuffled data.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to