[jira] [Commented] (PHOENIX-5494) Batched, mutable Index updates are unnecessarily run one-by-one

chenglei (Jira) Sun, 17 Nov 2019 08:52:45 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976079#comment-16976079
 ]


chenglei commented on PHOENIX-5494:
-----------------------------------

[~kozdemir],thank you for the detailed explains.

bq. we only need to ensure that we can add and remove entries for different 
rows to/from the map of mutations. And for this, I used  ConcurrentHashMap (see 
LocalTable.scanCurrentRowStates() where results = new ConcurrentHashMap<>()). 

Once again,  I recommend that  we should not make {{LocalTable}} as a global 
region-scope stateful variable, local variable is better to avoid concurrent 
issues. for example , in your latest 003 patch:
{code:java}
  public void scanCurrentRowStates(Set<ImmutableBytesPtr> rows, Collection<? 
extends ColumnReference> columns, long ts) throws IOException {
      if (results == null) {
          results = new ConcurrentHashMap<>();
      }
{code}

If two threads enter this method concurrently , they may create the 
{{ConcurrentHashMap}} twices, and because {{results}} member variable is not 
modified by {{volatile}} keyword, these two threads may work on their own 
{{ConcurrentHashMap}} copy. We should avoid these annoying concurrent issues 
completely.

bq.Actually, we break individual data mutations into multiple mutations for the 
index mutation preparation to make sure that the cells in each mutation have 
the same timestamp (see flattenMutationsByTimestamp() called in 
IndexRegionObserver.groupMutations()) for replay writes. 

So when we get the cells from the pre-scan results, we could filter the cells 
which timestamp are newer than the dataTable mutation.





> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5494
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5494
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>              Labels: performance
>         Attachments: 5494-4.x-HBase-1.5.txt, 
> PHOENIX-5494-4.x-HBase-1.4.patch, PHOENIX-5494.master.001.patch, 
> PHOENIX-5494.master.002.patch, PHOENIX-5494.master.003.patch, 
> Screenshot_20191110_160243.png, Screenshot_20191110_160351.png, 
> Screenshot_20191110_161453.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes 
> (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The 
> cost is mostly incured by the repeated setup (and seeking) of the new region 
> scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels 
> inclined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-5494) Batched, mutable Index updates are unnecessarily run one-by-one

Reply via email to