[
https://issues.apache.org/jira/browse/KUDU-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415901#comment-16415901
]
Todd Lipcon commented on KUDU-2381:
-----------------------------------
Looking at perf report it seems like the majority of CPU is in two spots:
In MayHaveDeltas:
{code}
for (auto& col: updates_by_col_) {
if (!col.empty()) {
return true;
}
}
{code}
{code}
for (UpdatesForColumn& ufc : updates_by_col_) {
ufc.clear();
}
{code}
Both of these end up being no-ops in the case that there are no updates in the
previously-scanned blocks. So, I think it would make sense to be more lazy
about initializing updates_by_col_.
> Optimize DeltaMemStore for case of no matching deltas
> -----------------------------------------------------
>
> Key: KUDU-2381
> URL: https://issues.apache.org/jira/browse/KUDU-2381
> Project: Kudu
> Issue Type: Improvement
> Components: perf, tablet
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
>
> Currently in a scan workload which scans 280 columns I see DeltaMemStore
> iteration taking up a significant amount of CPU in the scan, despite the fact
> that the dataset has no updates. Of 1.6sec in
> MaterializingIterator::NextBlock, we spent 0.61s in DMSIterator::PrepareBatch
> and 0.14s in DMSIterator::MayHaveDeltas. So, about 46% of our time here is on
> wasted work.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)