[ 
https://issues.apache.org/jira/browse/KUDU-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963431#comment-16963431
 ] 

Adar Dembo commented on KUDU-2963:
----------------------------------

Thanks for finding that 30s deadline; I didn't notice that and assumed the 
retrying is indefinite (or "close to indefinite" i.e. 1 hour). I was working 
off the behavior I saw in 
CreateTableITest_TestCreateWhenMajorityOfReplicasFailCreation which finishes in 
under 30s.


> Catalog manager never gives up on CreateTablet RPCs
> ---------------------------------------------------
>
>                 Key: KUDU-2963
>                 URL: https://issues.apache.org/jira/browse/KUDU-2963
>             Project: Kudu
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 1.11.0
>            Reporter: Adar Dembo
>            Assignee: Bankim Bhavsar
>            Priority: Major
>              Labels: newbie
>
> This is a problem when there aren't enough live tservers upon which to place 
> a tablet's replicas, or when a chosen tserver doesn't create the replica 
> quickly enough. If the catalog manager decides to replace the tablet, the 
> replaced tablet's CreateTablet RPCs continue to retry ad infinitum. If the 
> previously dead tservers then come back to life, they must needlessly process 
> the CreateTablet RPCs.
> The tablets are eventually deleted, either through explicit DeleteTablet RPCs 
> (triggered by the catalog manager replacement process), or by heartbeating, 
> but it's an unnecessary drain on cluster resources.
> We should probably abort CreateTablet RPCs for tablets that have been removed 
> from their table.
> CreateTableITest_TestCreateWhenMajorityOfReplicasFailCreation demonstrates 
> this acutely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to