Hi 何李夫, Just making sure I understand your point:
In MaterializingIterator::MaterializeBlock(), we iterate through all of the column predicates, and for each one, we start from the beginning of the column block. Your point is that upon evaluating a single column predicate, we may have some rows that we know don't belong in the results set. It then stands to reason that we should be able to move up the cur_idx_ for the entire rowwise iterator (CFileSet::Iterator) to avoid consideration of the rows we've already filtered out. This would probably serve to save some time for most block decoders, which don't take into account the selection vector at all; some decoders (like the dictionary decoder) do take into account the existing selection vector and avoid the unnecessary materialization. So you're probably right, we could probably save some cycles here. There might be some gotchas in actually implementing this since the hierarchy for cfile sets, cfile readers, and decoders is a bit complex. It seems you've got a fair amount of context already, so feel free to try testing it out! Andrew On Tue, Apr 10, 2018 at 12:37 AM, helifu <[email protected]> wrote: > Hi all, > > > > I read the function of ‘MaterializingIterator::MaterializeBlock’ > carefully, and found that there is something we can do. If we adjust the > ‘cur_idx_’ in ‘CFileSet::Iterator’ to the valid left of > ‘dst->selection_vector’ after each loop of predicate evaluation, we could > skip reading some unnecessary ‘data_block’ and that will help to speed up. > Am I right? :) > > > > 何李夫 > > 2017-04-10 16:06:24 > > > > -- Andrew Wong
