[ https://issues.apache.org/jira/browse/PHOENIX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281234#comment-16281234 ]
James Taylor commented on PHOENIX-4130: --------------------------------------- Thanks for the patch, [~vincentpoon]. I like this approach. A few comments/questions: - Once the index is put in a PENDING_ACTIVE state, who puts it back into an ACTIVE state if the client retries succeed? Are you counting on the partial index rebuilder in MetaDataRegionObserver to do that? The downside of that is that we'd replay some edits that have already been processed, but if we expect this to be rare perhaps that's ok. An alternative would be for Phoenix to handle the client retries and set the index status back to ACTIVE if they succeed. I'm not positive, but we may be able to detect from the client that retries were done and then we could potentially set the index back to ACTIVE and clear the INDEX_DISABLE_TIMESTAMP (FYI, see below for a util function that does this). - One good side effect of putting the index to a PENDING_ACTIVE state is that if the client were to crash during the retries, the partial index rebuilder would do the replay and eventual disable if the replay took too long. If this were to happen, though, do you think the lag time is too long to continue to use the index even though it's out of sync with the client (up to 45 minutes while previously it would have been ~15 seconds)? Another (though more complex) alternative would be to invent another state like PENDING_DISABLE which would act the same as PENDING_ACTIVE except on the client we could treat the index as disabled if the index stays in this state longer than a configurable amount of time (i.e. the same amount of time we were willing to retry on the server side). This would be a much shorter window in this case than the default 45 minutes of keeping the index active until the partial index rebuilder gives up and marks it as permanently disabled. - One corner case that's not handled is if we're updating indexes within the RS. This can occur as an optimization when auto commit is on and a DML statement is operating on a single table (i.e. DELETE FROM T). In this case, the commits occur in UngroupedAggregateRegionObserver.commit(), unfortunately outside of the abstraction we have in MutationState. - We'll want to document that the QueryServices.INDEX_FAILURE_DISABLE_INDEX config property needs to be set both on the client and server side as we're relying on that to determine both the client behaviour and the server behaviour (where before it was only used on the server side). Not a huge deal, but if we can prevent needing to check it client side (maybe based on information thrown back in the exception?), then that'd be an improvement IMHO. - You might want to consider using the IndexUtil.updateIndexState(conn, fullIndexTableName, PIndexState.DISABLE, 0L) method to disable the index instead of using the one you wrote. If you use yours, then you'd want to make sure to put the schema name, table name, and index name in double quotes in case they're case sensitive (as otherwise you'd get a TableNotFoundException). {code} + private void disableIndex(String dataTableFullName, String indexName) throws SQLException { + logger.info( + "Disabling index after hitting max number of index write retries: " + indexName); + String disableIndexDDL = "ALTER INDEX %s ON %s DISABLE"; + connection.createStatement().execute(String.format(disableIndexDDL, + SchemaUtil.getTableNameFromFullName(indexName), dataTableFullName)); + } {code} > Avoid server retries for mutable indexes > ---------------------------------------- > > Key: PHOENIX-4130 > URL: https://issues.apache.org/jira/browse/PHOENIX-4130 > Project: Phoenix > Issue Type: Improvement > Reporter: Lars Hofhansl > Assignee: Vincent Poon > Fix For: 4.14.0 > > Attachments: PHOENIX-4130.v1.master.patch > > > Had some discussions with [~jamestaylor], [~samarthjain], and [~vincentpoon], > during which I suggested that we can possibly eliminate retry loops happening > at the server that cause the handler threads to be stuck potentially for > quite a while (at least multiple seconds to ride over common scenarios like > splits). > Instead we can do the retries at the Phoenix client that. > So: > # The index updates are not retried on the server. (retries = 0) > # A failed index update would set the failed index timestamp but leave the > index enabled. > # Now the handler thread is done, it throws an appropriate exception back to > the client. > # The Phoenix client can now retry. When those retries fail the index is > disabled (if the policy dictates that) and throw the exception back to its > caller. > So no more waiting is needed on the server, handler threads are freed > immediately. -- This message was sent by Atlassian JIRA (v6.4.14#64029)