[ https://issues.apache.org/jira/browse/PHOENIX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321569#comment-16321569 ]
Vincent Poon commented on PHOENIX-4130: --------------------------------------- [~jamestaylor] Thanks for the helpful feedback. Regarding the PENDING_DISABLE, could we instead just use PENDING_ACTIVE but interpret it based on the "disableIndexOnWriteFailure" config setting? I'm envisioning the following: In PhoenixIndexFailurePolicy, we always set the index state to PENDING_ACTIVE (unless !disableIndexOnFailure && !rebuildIndexOnFailure in which case we do nothing) if disableIndexOnFailure=false, the client will not disable the index. Rebuilder will run normally. if disableIndexOnFailure=true , the client will disable the index after retries are exhausted. However, we put an additional check in MetaDataRegionObserver rebuilder: if (disableIndexOnFailure=true && indexState=PENDING_ACTIVE) , AND the time elapsed since indexDisableTimestamp > ~15 seconds, then we know the client didn't do its job for whatever reason, and we mark the index disabled. Next run of rebuilder will run just like the index was disabled. The downside of this is that there's a potential window of 75 seconds (if the rebuilder starts just before the index is disabled), or perhaps even longer if the rebuilder is busy rebuilding some other index. However, it's an improvement upon the 30 minute DEFAULT_INDEX_REBUILD_DISABLE_TIMESTAMP_THRESHOLD, while keeping the patch somewhat simpler. WDYT? > Avoid server retries for mutable indexes > ---------------------------------------- > > Key: PHOENIX-4130 > URL: https://issues.apache.org/jira/browse/PHOENIX-4130 > Project: Phoenix > Issue Type: Improvement > Reporter: Lars Hofhansl > Assignee: Vincent Poon > Fix For: 4.14.0 > > Attachments: PHOENIX-4130.v1.master.patch > > > Had some discussions with [~jamestaylor], [~samarthjain], and [~vincentpoon], > during which I suggested that we can possibly eliminate retry loops happening > at the server that cause the handler threads to be stuck potentially for > quite a while (at least multiple seconds to ride over common scenarios like > splits). > Instead we can do the retries at the Phoenix client that. > So: > # The index updates are not retried on the server. (retries = 0) > # A failed index update would set the failed index timestamp but leave the > index enabled. > # Now the handler thread is done, it throws an appropriate exception back to > the client. > # The Phoenix client can now retry. When those retries fail the index is > disabled (if the policy dictates that) and throw the exception back to its > caller. > So no more waiting is needed on the server, handler threads are freed > immediately. -- This message was sent by Atlassian JIRA (v6.4.14#64029)