[
https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987521#comment-16987521
]
Lars Hofhansl edited comment on PHOENIX-5494 at 12/4/19 4:46 AM:
-----------------------------------------------------------------
It's all about how what percentage of the time is spent in setting up the
scanners seeking per row.
In my test I ran this with WAL writing turned off, and I tested with local
indexes, so that percentage improvement would be larger, although the absolute
improvement would be the same. And obviously the larger the batch size, the
better the improvement, and I had set the batchsize to 1000 (default is 100).
33% end-to-end is nothing to scoff about!
was (Author: lhofhansl):
It's all about how what percentage of the time is spent in setting up the
scanners seeking per row.
In my test I ran this with WAL writing turned off, and I tested with local
indexes, so that percentage improvement would be larger, although the absolute
improvement would be the same. And obviously the larger the batch size, the
better the improvement.
33% end-to-end is nothing to scoff about!
> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
> Key: PHOENIX-5494
> URL: https://issues.apache.org/jira/browse/PHOENIX-5494
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.15.0, 5.1.0
> Reporter: Lars Hofhansl
> Assignee: chenglei
> Priority: Major
> Labels: performance
> Fix For: 4.15.0, 5.1.0
>
> Attachments: 5494-4.x-HBase-1.5.txt,
> PHOENIX-5494-4.x-HBase-1.4.patch, PHOENIX-5494.master.001.patch,
> PHOENIX-5494.master.002.patch, PHOENIX-5494.master.003.patch,
> PHOENIX-5494_v9-4.x-HBase-1.4.patch, PHOENIX-5494_v9-master.patch,
> Screenshot_20191110_160243.png, Screenshot_20191110_160351.png,
> Screenshot_20191110_161453.png
>
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes
> (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The
> cost is mostly incured by the repeated setup (and seeking) of the new region
> scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels
> inclined.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)