[
https://issues.apache.org/jira/browse/PHOENIX-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212060#comment-17212060
]
ASF GitHub Bot commented on PHOENIX-6160:
-----------------------------------------
kadirozde commented on a change in pull request #897:
URL: https://github.com/apache/phoenix/pull/897#discussion_r502979620
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
##########
@@ -166,12 +175,36 @@ public static void
setFailDataTableUpdatesForTesting(boolean fail) {
private HashSet<ImmutableBytesPtr> rowsToLock = new HashSet<>();
// The current and next states of the data rows corresponding to the
pending mutations
private HashMap<ImmutableBytesPtr, Pair<Put, Put>> dataRowStates;
- // Data table pending mutations
+ // The previous concurrent batch contexts
+ private HashMap<ImmutableBytesPtr, BatchMutateContext>
lastConcurrentBatchContext = null;
+ // The latches of the threads waiting for this batch to complete
+ private List<CountDownLatch> waitList = null;
private Map<ImmutableBytesPtr, MultiMutation> multiMutationMap;
private BatchMutateContext(int clientVersion) {
this.clientVersion = clientVersion;
}
+
+ public BatchMutatePhase getCurrentPhase() {
+ return currentPhase;
+ }
+
+ public Put getNextDataRowState(ImmutableBytesPtr rowKeyPtr) {
+ Pair<Put, Put> rowState = dataRowStates.get(rowKeyPtr);
+ if (rowState != null) {
+ return dataRowStates.get(rowKeyPtr).getSecond();
Review comment:
> > Algorithm makes sense, it's a nice simplification, and just had a
few nits on the code. My bigger concern is the potential increase in tail
latency on writes to hot rows in write latency-sensitive applications. Would be
good to get some perf numbers.
>
> Good suggestion. I will do some perf runs and update the Jira.
@gjacoby126, I have updated the design doc with the performance testing
results (
https://docs.google.com/document/d/12H_MwsPtyM0ORiBHclBpBLZWtm4zpY_cc5y_pwtgMUk/edit#heading=h.yt8378ps0k6e)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Simplifying concurrent mutation handling for global Indexes
> -----------------------------------------------------------
>
> Key: PHOENIX-6160
> URL: https://issues.apache.org/jira/browse/PHOENIX-6160
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.15.0
> Reporter: Kadir OZDEMIR
> Assignee: Kadir OZDEMIR
> Priority: Major
> Attachments: PHOENIX-6160.4.x.001.patch
>
>
> Please see the attached design document for the proposed simplification. The
> proposed design is simpler to understand and does not require a special
> handling of partial concurrent updates without indexed columns.
> One of the desired features for global indexes is to support atomic
> operations (ON_DUPLICATE_KEY statements). We have found that it is quite
> difficult to build such a feature on the current design as we need to add
> more case handling to the current design to handle data table update ordering
> issues. The proposed design does not require us to do changes on concurrent
> mutation handling for such features.
> The proposed design almost eliminates unverified index rows due to concurrent
> mutations. The index rows are left unverified only when batches fail to
> complete the data table updates. This leads to read performance improvement
> as repairing unverified rows is costly and each row repair adds several tens
> of milliseconds to the overall scan latency.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)