[
https://issues.apache.org/jira/browse/KUDU-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin updated KUDU-3124:
--------------------------------
Summary: A safer way to handle {Create,Delete}Tablet requests (was: A
safer way to handle CreateTablet requests)
> A safer way to handle {Create,Delete}Tablet requests
> ----------------------------------------------------
>
> Key: KUDU-3124
> URL: https://issues.apache.org/jira/browse/KUDU-3124
> Project: Kudu
> Issue Type: Improvement
> Components: master, tserver
> Affects Versions: 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0,
> 1.7.1, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0
> Reporter: Alexey Serbin
> Priority: Major
>
> As of now, catalog manager (a part of kudu-master) sends
> {{CreateTabletRequest}} and {{DeleteTabletRequest}} RPCs
> as soon as they are realized by
> {{CatalogManager::ProcessPendingAssignments()}}
> when processing the list of deferred DDL operations, and at this level there
> isn't any restrictions on how many of those might be in flight or sent to
> a particular tablet server (NOTE: there is {{\-\-max_create_tablets_per_ts}}
> flag,
> but it works on a higher level and only during initial creation of a table).
> The {{CreateTablet}} requests are sent asynchronously, and if the tablet isn't
> created within {{\-\-tablet_creation_timeout_ms}} milliseconds, catalog
> manager
> replaces all the tablet replicas, generating a new tablet UUID and sending
> corresponding {{CreateTabletRequest}} RPCs to a potentially different set of
> tablet
> servers. Corresponding {{DeleteTabletRequest}} RPCs (to remove the replicas
> of the
> stalled-during-creation tablet) are sent separately in an asynchronous way
> as well.
> There are at least two issues with this approach:
> # The {{\-\-max_create_tablets_per_ts}} threshold limits the number of
> concurrent requests hitting one tablet server only during the initial
> creation of a table. However, nothing limits how many requests to create a
> table replica might hit a tablet server when adding partitions to an existing
> table as a result of ALTER TABLE request.
> # {{DeleteTabletRequest}} RPCs sometimes might not get into the RPC queues of
> corresponding tablet servers, and catalog manager stops retrying sending those
> after {{\-\-unresponsive_ts_rpc_timeout_ms}} interval. This might spiral
> into a situation when requests to create replacement tablet replicas are
> passing through and executed by tablet servers, but corresponding requests to
> delete tablet replica cannot get through because of queue overflows, with
> catalog manager eventually giving up retrying the latter ones. Eventually,
> tablet servers end up with huge number of tablet replicas created, and they
> crash running out of memory. The crashed tablet servers cannot start after
> that because they eventually run out of memory trying to bootstrap the huge
> number of tablet replicas (running out of memory again). See
> https://gerrit.cloudera.org/#/c/15912/ for the reproduction scenario and
> [KUDU-2453|https://issues.apache.org/jira/browse/KUDU-2453] for corresponding
> issue reported some time ago.
> Ideally, the catalog manager should put all the generated
> {{\{Create,Delete\}TabletRequests}} into per-tserver queues and make sure at
> most {{N}} requests are being processed by a tablet server at a time (the
> {{N}} parameter is a replacement for {{\-\-max_create_tablets_per_ts}} in
> this safer approach).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)