[ 
https://issues.apache.org/jira/browse/KUDU-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612558#comment-16612558
 ] 

Adar Dembo commented on KUDU-686:
---------------------------------

Sure.

What's alluded to here is that {{DeltaFileIterator::ApplyUpdates}} must decode 
all of the prepared deltas, but then only applies deltas for the column whose 
ID was passed in by argument. When the iterator's projection includes multiple 
columns, this repeated decoding becomes quite CPU expensive.

The {{DMSIterator}} used to work this way too, but commit fb58895a7 changed it 
to do all of the delta decoding at {{PrepareBatch}} time, and to store the 
decoded deltas in per-column queues. Then, when {{ApplyUpdates}} is called, the 
appropriate queue is walked and its contents are copied over into the 
{{ColumnBlock}}. Thus, no unnecessary work is performed by {{ApplyUpdates}}, 
and, as far as the {{DMSIterator}} is concerned, there's no need for the 
"multi-pass" algorithm described earlier in the bug report.

All I'm doing is recognizing that the {{DMSIterator}} logic responsible for 
decoding and organizing deltas in {{PrepareBatch}} and applying them in 
{{ApplyUpdates}} is actually quite generic and can be reused by 
{{DeltaFileIterator}} too. I'm tackling this because such a refactor simplifies 
incremental backup support.

> Delta apply optimizations
> -------------------------
>
>                 Key: KUDU-686
>                 URL: https://issues.apache.org/jira/browse/KUDU-686
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf, tablet
>    Affects Versions: M4.5
>            Reporter: David Alves
>            Assignee: Adar Dembo
>            Priority: Trivial
>
> We currently iterate on each delta file several times, one for deletes and 
> then one for each one of the columns.
> It seems that, when selecting all the columns it would be more efficient to 
> apply the deltas to all columns at the same time. This might or might not be 
> advantageous depending on the number of columns projected. Todd also suggest 
> that whether this is an advantage also depends on whether there are 
> predicates being pushed down.
> We could likely also merge the updates and deletes into a single iteration or 
> at least avoid applying the mutations if the row will end up delete (right 
> now we still apply the updates even when we find that the row will be 
> deleted).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to