[
https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987618#comment-16987618
]
Kadir OZDEMIR commented on PHOENIX-5494:
----------------------------------------
I used the default batch size. In my test, there are 8 client threads. Each
client commits 10000 rows at a time. The row keys for these rows are almost
random. During the test 200+million rows were written. At the end of the test,
tables had 24 regions. The drop on the average latency is 21% from 4023ms to
3177ms. Sorry, I wrote incorrectly 33%, I calculated mentally the percentage of
the diff to the 3177 instead of 4023.
Please note that in my test there are two indexes on the data table. The data
table is configured with the new index design (IndexRegionObserver). Every data
table update consists of the following steps executed sequentially (1) one
local data table row read,(2) two-to-four parallel (possible remote) index row
writes (first phase writes), (3) one local data table row write, and (4)
two-to-four parallel (possibly remote) index row writes (third phase writes).
If we assume two-to-four parallel writes take as much time as a single write
does (in reality they will take longer), then each update includes one local
read, one local write and 2 remote writes. If we further assume that remote
writes take as much time as local writes do (in reality remote writes will take
longer), then each update includes one read and 3 writes. Now if assume reads
and writes takes approximately the same amount of time, then making reads zero
cost operation will result in about 25% improvement. Yes, reads are more
expensive than writes but reads are local and writes are mostly remote. Given
that I did not consider additional latencies due to row lock contentions, and
the RPCs between clients to servers, 21% improvement is not bad at all.
> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
> Key: PHOENIX-5494
> URL: https://issues.apache.org/jira/browse/PHOENIX-5494
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.15.0, 5.1.0
> Reporter: Lars Hofhansl
> Assignee: chenglei
> Priority: Major
> Labels: performance
> Fix For: 4.15.0, 5.1.0
>
> Attachments: 5494-4.x-HBase-1.5.txt,
> PHOENIX-5494-4.x-HBase-1.4.patch, PHOENIX-5494.master.001.patch,
> PHOENIX-5494.master.002.patch, PHOENIX-5494.master.003.patch,
> PHOENIX-5494_v9-4.x-HBase-1.4.patch, PHOENIX-5494_v9-master.patch,
> Screenshot_20191110_160243.png, Screenshot_20191110_160351.png,
> Screenshot_20191110_161453.png
>
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes
> (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The
> cost is mostly incured by the repeated setup (and seeking) of the new region
> scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels
> inclined.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)