[jira] [Commented] (PHOENIX-5494) Batched, mutable Index updates are unnecessarily run one-by-one

Kadir OZDEMIR (Jira) Tue, 03 Dec 2019 23:36:55 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987618#comment-16987618
 ]


Kadir OZDEMIR commented on PHOENIX-5494:
----------------------------------------

I used the default batch size. In my test, there are 8 client threads. Each 
client commits 10000 rows at a time. The row keys for these rows are almost 
random. During the test 200+million rows were written. At the end of the test, 
tables had 24 regions. The drop on the average latency is 21% from 4023ms to 
3177ms. Sorry, I wrote incorrectly 33%, I calculated mentally the percentage of 
the diff to the 3177 instead of 4023.

Please note that in my test there are two indexes on the data table. The data 
table is configured with the new index design (IndexRegionObserver). Every data 
table update consists of the following steps executed sequentially (1) one 
local data table row read,(2) two-to-four parallel (possible remote) index row 
writes  (first phase writes), (3) one local data table row write, and (4) 
two-to-four parallel (possibly remote) index row writes (third phase writes). 
If we assume two-to-four parallel writes take as much time as a single write 
does (in reality they will take longer), then each update includes one local 
read, one local write and 2 remote writes. If we further assume that remote 
writes take as much time as local writes do (in reality remote writes will take 
longer), then each update includes one read and 3 writes. Now if assume reads 
and writes takes approximately the same amount of time, then making reads zero 
cost operation will result in about 25% improvement. Yes, reads are more 
expensive than writes but reads are local and writes are mostly remote. Given 
that I did not consider additional latencies due to row lock contentions, and 
the RPCs between clients to servers, 21% improvement is not bad at all.

> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5494
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5494
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.15.0, 5.1.0
>            Reporter: Lars Hofhansl
>            Assignee: chenglei
>            Priority: Major
>              Labels: performance
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: 5494-4.x-HBase-1.5.txt, 
> PHOENIX-5494-4.x-HBase-1.4.patch, PHOENIX-5494.master.001.patch, 
> PHOENIX-5494.master.002.patch, PHOENIX-5494.master.003.patch, 
> PHOENIX-5494_v9-4.x-HBase-1.4.patch, PHOENIX-5494_v9-master.patch, 
> Screenshot_20191110_160243.png, Screenshot_20191110_160351.png, 
> Screenshot_20191110_161453.png
>
>          Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes 
> (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The 
> cost is mostly incured by the repeated setup (and seeking) of the new region 
> scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels 
> inclined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-5494) Batched, mutable Index updates are unnecessarily run one-by-one

Reply via email to