Ivan Andika created HDDS-15330:
----------------------------------

             Summary: Implement SCM FCR rate-limit
                 Key: HDDS-15330
                 URL: https://issues.apache.org/jira/browse/HDDS-15330
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Ivan Andika
            Assignee: Ivan Andika


We have previous instances where a new bootstrapped SCM becomes OOM (FYI the 
OOM has 96GB heap size). We suspect that it's due to the concurrent FCR reports 
processed in SCM. 

HDFS implements a full block reports rate limit in HDFS-7923 to reduce the 
concurrent block reports residing in SCM using BlockReportLeaseManager. Ozone 
should also implement similar mechanism to prevent FCR storms.

A possible design is that we register DN first, but don't include the full FCR 
immediately. SCM grants only N datanodes permission to send FCRs at once, 
similar to HDFS implementation.

One tradeoff of the rate-limiting is that new SCM might delay the SafeMode 
exit. However, this is better than SCM OOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to