[
https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976079#comment-16976079
]
chenglei commented on PHOENIX-5494:
-----------------------------------
[~kozdemir],thank you for the detailed explains.
bq. we only need to ensure that we can add and remove entries for different
rows to/from the map of mutations. And for this, I used ConcurrentHashMap (see
LocalTable.scanCurrentRowStates() where results = new ConcurrentHashMap<>()).
Once again, I recommend that we should not make {{LocalTable}} as a global
region-scope stateful variable, local variable is better to avoid concurrent
issues. for example , in your latest 003 patch:
{code:java}
public void scanCurrentRowStates(Set<ImmutableBytesPtr> rows, Collection<?
extends ColumnReference> columns, long ts) throws IOException {
if (results == null) {
results = new ConcurrentHashMap<>();
}
{code}
If two threads enter this method concurrently , they may create the
{{ConcurrentHashMap}} twices, and because {{results}} member variable is not
modified by {{volatile}} keyword, these two threads may work on their own
{{ConcurrentHashMap}} copy. We should avoid these annoying concurrent issues
completely.
bq.Actually, we break individual data mutations into multiple mutations for the
index mutation preparation to make sure that the cells in each mutation have
the same timestamp (see flattenMutationsByTimestamp() called in
IndexRegionObserver.groupMutations()) for replay writes.
So when we get the cells from the pre-scan results, we could filter the cells
which timestamp are newer than the dataTable mutation.
> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
> Key: PHOENIX-5494
> URL: https://issues.apache.org/jira/browse/PHOENIX-5494
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Lars Hofhansl
> Assignee: Kadir OZDEMIR
> Priority: Major
> Labels: performance
> Attachments: 5494-4.x-HBase-1.5.txt,
> PHOENIX-5494-4.x-HBase-1.4.patch, PHOENIX-5494.master.001.patch,
> PHOENIX-5494.master.002.patch, PHOENIX-5494.master.003.patch,
> Screenshot_20191110_160243.png, Screenshot_20191110_160351.png,
> Screenshot_20191110_161453.png
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes
> (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The
> cost is mostly incured by the repeated setup (and seeking) of the new region
> scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels
> inclined.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)