[ 
https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975309#comment-16975309
 ] 

Kadir OZDEMIR commented on PHOENIX-5494:
----------------------------------------

[~comnetwork], Yes, you can filter out cells based on  timestamps if we 
continue to do raw scans with all versions, something that I would like to get 
rid of eventually, at least for the new design. You also need to find the max 
timestamp for a batch replay writes and set the scan time range properly.

For regular writes, we lock the data table rows during index updates (see 
IndexRegionObserver.lockRows()). This guarantees that only one update can be 
prepared for a given row. Now given that there can be only one thread working 
on a given row, we only need to ensure that we can add and remove entries for 
different rows to/from the map of mutations. And for this, I used  
ConcurrentHashMap (see LocalTable.scanCurrentRowStates() where results = new 
ConcurrentHashMap<>()). 

For regular writes, we group the mutations on the multiple rows into one (see 
IndexRegionObserver.groupMutations()) however we do not do this for replay 
writes at least for the new design.  Actually, we break individual data 
mutations into multiple mutations for the index mutation preparation to make 
sure that the cells in each mutation have the same timestamp (see 
flattenMutationsByTimestamp() called in IndexRegionObserver.groupMutations()). 
This was the other reason I did not want to use this optimization for replay 
writes. I will look into your patch more and try to merge it.

> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5494
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5494
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>              Labels: performance
>         Attachments: 5494-4.x-HBase-1.5.txt, 
> PHOENIX-5494-4.x-HBase-1.4.patch, PHOENIX-5494.master.001.patch, 
> PHOENIX-5494.master.002.patch, PHOENIX-5494.master.003.patch, 
> Screenshot_20191110_160243.png, Screenshot_20191110_160351.png, 
> Screenshot_20191110_161453.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes 
> (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The 
> cost is mostly incured by the repeated setup (and seeking) of the new region 
> scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels 
> inclined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to