Adar Dembo resolved KUDU-1913.
          Resolution: Fixed
       Fix Version/s: 1.7.0
    Target Version/s: 1.7.0  (was: 1.8.0)

I just merged commit debcb8e which establishes a cap on the two server-wide 
threadpools that were previously unbounded (i.e. the Raft and Prepare pools). 
Based on discussion with David I ended up setting a cap of 10% of RLIMIT_NPROC 
on each, though this is configurable at runtime.

The last remaining per-replica thread is the WAL append thread, and it only 
runs for replicas that are being actively written to. This is by design and 
unlikely to change in the foreseeable future, so I think we can finally resolve 
this JIRA.

> Tablet server runs out of threads when creating lots of tablets
> ---------------------------------------------------------------
>                 Key: KUDU-1913
>                 URL: https://issues.apache.org/jira/browse/KUDU-1913
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: consensus, log
>            Reporter: Juan Yu
>            Assignee: Adar Dembo
>            Priority: Major
>              Labels: data-scalability
>             Fix For: 1.7.0
> When adding lots of range partitions, all tablet server crashed with the 
> following error:
> F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad 
> status: Runtime error: Could not create thread: Resource temporarily 
> unavailable (error 11)
> Tablet server should handle error/failure more gracefully instead of crashing.

This message was sent by Atlassian JIRA

Reply via email to