[
https://issues.apache.org/jira/browse/KUDU-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adar Dembo reassigned KUDU-686:
-------------------------------
Assignee: Adar Dembo
I'm going to fix this using a different tack: port the {{PrepareBatch}}-heavy
approach from {{DMSIterator}} to {{DeltaFileIterator}}. The idea is to do a lot
more work in {{PrepareBatch}} such that there's no need for any IO in the
per-column {{ApplyUpdates}} calls later on.
With that in place, there's no longer any need to switch to a "multi-pass"
approach outlined earlier in this bug report.
> Delta apply optimizations
> -------------------------
>
> Key: KUDU-686
> URL: https://issues.apache.org/jira/browse/KUDU-686
> Project: Kudu
> Issue Type: Improvement
> Components: perf, tablet
> Affects Versions: M4.5
> Reporter: David Alves
> Assignee: Adar Dembo
> Priority: Trivial
>
> We currently iterate on each delta file several times, one for deletes and
> then one for each one of the columns.
> It seems that, when selecting all the columns it would be more efficient to
> apply the deltas to all columns at the same time. This might or might not be
> advantageous depending on the number of columns projected. Todd also suggest
> that whether this is an advantage also depends on whether there are
> predicates being pushed down.
> We could likely also merge the updates and deletes into a single iteration or
> at least avoid applying the mutations if the row will end up delete (right
> now we still apply the updates even when we find that the row will be
> deleted).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)