[ https://issues.apache.org/jira/browse/KUDU-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963431#comment-16963431 ]
Adar Dembo commented on KUDU-2963: ---------------------------------- Thanks for finding that 30s deadline; I didn't notice that and assumed the retrying is indefinite (or "close to indefinite" i.e. 1 hour). I was working off the behavior I saw in CreateTableITest_TestCreateWhenMajorityOfReplicasFailCreation which finishes in under 30s. > Catalog manager never gives up on CreateTablet RPCs > --------------------------------------------------- > > Key: KUDU-2963 > URL: https://issues.apache.org/jira/browse/KUDU-2963 > Project: Kudu > Issue Type: Improvement > Components: master > Affects Versions: 1.11.0 > Reporter: Adar Dembo > Assignee: Bankim Bhavsar > Priority: Major > Labels: newbie > > This is a problem when there aren't enough live tservers upon which to place > a tablet's replicas, or when a chosen tserver doesn't create the replica > quickly enough. If the catalog manager decides to replace the tablet, the > replaced tablet's CreateTablet RPCs continue to retry ad infinitum. If the > previously dead tservers then come back to life, they must needlessly process > the CreateTablet RPCs. > The tablets are eventually deleted, either through explicit DeleteTablet RPCs > (triggered by the catalog manager replacement process), or by heartbeating, > but it's an unnecessary drain on cluster resources. > We should probably abort CreateTablet RPCs for tablets that have been removed > from their table. > CreateTableITest_TestCreateWhenMajorityOfReplicasFailCreation demonstrates > this acutely. -- This message was sent by Atlassian Jira (v8.3.4#803005)