Alexey Serbin created KUDU-3124:
-----------------------------------

             Summary: A safer way to handle CreateTablet requests
                 Key: KUDU-3124
                 URL: https://issues.apache.org/jira/browse/KUDU-3124
             Project: Kudu
          Issue Type: Improvement
          Components: master, tserver
    Affects Versions: 1.11.1, 1.11.0, 1.10.1, 1.10.0, 1.9.0, 1.7.1, 1.8.0, 
1.7.0, 1.6.0, 1.5.0, 1.4.0, 1.3.1, 1.3.0, 1.2.0
            Reporter: Alexey Serbin


As of now, catalog manager (a part of kudu-master) sends 
{{CreateTabletRequest}} RPC
as soon as they are realized by {{CatalogManager::ProcessPendingAssignments()}}
when processing the list of deferred DDL operations, and at this level there
isn't any restrictions on how many of those might be in flight or sent to
a particular tablet server (NOTE: there is {{\-\-max_create_tablets_per_ts}} 
flag,
but it works on a higher level and only during initial creation of a table).

The {{CreateTablet}} requests are sent asynchronously, and if the tablet isn't
created within {{\-\-tablet_creation_timeout_ms|| milliseconds, catalog manager
replaces all the tablet replicas, generating a new tablet UUID and sending
corresponding {{CreateTabletRequest}} RPCs to a potentially different set of 
tablet
servers.  Corresponding {{DeleteTabletRequest}} RPCs (to remove the replicas of 
the
stalled-during-creation tablet) are sent separately in an asynchronous way
as well.

There are at least two issues with this approach:

# The {{\-\-max_create_tablets_per_ts}} threshold limits the number of 
concurrent requests hitting one tablet server only during the initial creation 
of a table. However, nothing limits how many requests to create a table replica 
might hit a tablet server when adding partitions to an existing table as a 
result of ALTER TABLE request.
# {{DeleteTabletRequest}} RPCs sometimes might not get into the RPC queues of
corresponding tablet servers, and catalog manager stops retrying sending those
after {{\-\-unresponsive_ts_rpc_timeout_ms}} interval.  This might spiral into 
a situation when requests to create replacement tablet replicas are passing 
through and executed by tablet servers, but corresponding requests to delete 
tablet replica cannot get through because of queue overflows, with catalog 
manager eventually giving up retrying the latter ones.  Eventually, tablet 
servers end up with huge number of tablet replicas created, and they crash 
running out of memory.  The crashed tablet servers cannot start after that 
because they eventually run out of memory trying to bootstrap the huge number 
of tablet replicas (running out of memory again). See 
https://gerrit.cloudera.org/#/c/15912/ for the reproduction scenario and 
[KUDU-2453|https://issues.apache.org/jira/browse/KUDU-2453] for corresponding 
issue reported some time ago. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to