[
https://issues.apache.org/jira/browse/KUDU-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301309#comment-15301309
]
Todd Lipcon commented on KUDU-749:
----------------------------------
Just addressed the quadratic time delta collection in
c7178e97e842f42e9ed9d5e9e2a4f521fbe70b6b.
Another item I'm noting is that the "heavy write" diskrowsets aren't getting
agressivelly major-delta-compacted. The issue is that we only look at the total
size ratio of the delta files vs the base data, and not a more realistic
measure of performance. One quick thought is to actually capture counters at
runtime for the number of deltas _applied during reads_ on the DRS vs the
_number of rows read_. So, for the case where a single row has a zipfian
pattern, the ratio will be quite high (eg 1000:1) whereas in the more analytic
use case where a single column has been updated once across all rows, the ratio
will be more like 1:1.
The potential downside of this read-dependent tracking is that it wouldn't
apply as well on the replicas where there might not be a heavy read workload,
and then a leader change would result in a big latency spike as the readers
started to shift to unoptimized replicas.
> Improve performance for zipfian update
> --------------------------------------
>
> Key: KUDU-749
> URL: https://issues.apache.org/jira/browse/KUDU-749
> Project: Kudu
> Issue Type: Improvement
> Components: perf, tablet
> Affects Versions: Private Beta
> Reporter: Todd Lipcon
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> A zipfian 50/50 update/read workload on YCSB gets slower and slower until
> it's pretty intolerable (random reads taking 100+ms of CPU). It seems like
> all the CPU is spent in DMSIterator::PrepareBatch. We're probably doing
> something dumb here - let's look for some low hanging fruit to fix this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)