[ https://issues.apache.org/jira/browse/PHOENIX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341590#comment-16341590 ]
ASF GitHub Bot commented on PHOENIX-4130: ----------------------------------------- Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/290#discussion_r164220749 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/hbase/index/write/IndexWriterUtils.java --- @@ -70,13 +70,13 @@ public static final String HTABLE_KEEP_ALIVE_KEY = "hbase.htable.threads.keepalivetime"; public static final String INDEX_WRITER_RPC_RETRIES_NUMBER = "phoenix.index.writes.rpc.retries.number"; - /** - * Based on the logic in HBase's AsyncProcess, a default of 11 retries with a pause of 100ms - * approximates 48 sec total retry time (factoring in backoffs). The total time should be less - * than HBase's rpc timeout (default of 60 sec) or else the client will retry before receiving - * the response - */ - public static final int DEFAULT_INDEX_WRITER_RPC_RETRIES_NUMBER = 11; + /** + * Retry server-server index write rpc only once, and let the client retry the data write + * instead to avoid typing up the handler + */ + // note in HBase 2+, numTries = numRetries + 1 + // in prior versions, numTries = numRetries + public static final int DEFAULT_INDEX_WRITER_RPC_RETRIES_NUMBER = 1; --- End diff -- I was thinking it'd be ok if the onus was on the operator. I think we're in the minority in having a mix of old/new clients with zero downtime. If you think it's worth checking the client version (or checking if upgraded yet) and can come up with a reasonable way of doing that, that's fine too. Not sure it's worth the effort, though. Here's some ideas: * add client version to IndexMaintainer (which is protobuf-ed, so it'd be doable). Then use the "correct" IndexBuilder (or config?) based on the client version pulled out of the IndexMaintainer. If you have separate IndexBuilder instances, then you'd get duplicate thread pools to do the writing too which isn't ideal. Maybe within the parallel writer you could use the correct config? * less dynamic, but check the timestamp of system.catalog to know what version you're at. Somewhat scary to have an RPC in the start path of the coprocessor, though, as if system.catalog RS can't be reached, then you're kind of screwed (but many you are already anyway). * least dynamic - put onus on operator and change to setIfUnset. > Avoid server retries for mutable indexes > ---------------------------------------- > > Key: PHOENIX-4130 > URL: https://issues.apache.org/jira/browse/PHOENIX-4130 > Project: Phoenix > Issue Type: Improvement > Reporter: Lars Hofhansl > Assignee: Vincent Poon > Priority: Major > Fix For: 4.14.0 > > Attachments: PHOENIX-4130.v1.master.patch, > PHOENIX-4130.v2.master.patch, PHOENIX-4130.v3.master.patch > > > Had some discussions with [~jamestaylor], [~samarthjain], and [~vincentpoon], > during which I suggested that we can possibly eliminate retry loops happening > at the server that cause the handler threads to be stuck potentially for > quite a while (at least multiple seconds to ride over common scenarios like > splits). > Instead we can do the retries at the Phoenix client that. > So: > # The index updates are not retried on the server. (retries = 0) > # A failed index update would set the failed index timestamp but leave the > index enabled. > # Now the handler thread is done, it throws an appropriate exception back to > the client. > # The Phoenix client can now retry. When those retries fail the index is > disabled (if the policy dictates that) and throw the exception back to its > caller. > So no more waiting is needed on the server, handler threads are freed > immediately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)