[jira] [Updated] (HDDS-4404) Datanode can go OOM when a Recon or SCM Server is very slow in processing reports.

Siyao Meng (Jira) Fri, 30 Oct 2020 09:36:40 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siyao Meng updated HDDS-4404:
-----------------------------
    Target Version/s: 1.1.0

> Datanode can go OOM when a Recon or SCM Server is very slow in processing 
> reports.
> ----------------------------------------------------------------------------------
>
>                 Key: HDDS-4404
>                 URL: https://issues.apache.org/jira/browse/HDDS-4404
>             Project: Hadoop Distributed Data Store
>          Issue Type: Task
>          Components: Ozone Datanode
>    Affects Versions: 1.0.0
>            Reporter: Aravindan Vijayan
>            Assignee: Siyao Meng
>            Priority: Critical
>         Attachments: Screen Shot 2020-10-26 at 11.24.09 PM.png
>
>
> From [~nanda619]'s analysis.
> ContainerReportPublisher thread runs periodically (default interval 60s) in 
> Datanode and adds ContainerReport to StateContext (Queue).
> Heartbeat thread runs periodically (default interval 30s), picks up the 
> ContainerReport (if any) from StateContext.
> For short time, the ContainerReport will be held in Datanode StateContext.
> For Recon, a change was made in datanode such that the ContainerReport will 
> be cached in Datanode StateContext separately for each endpoint (i.e. SCM and 
> Recon). As I see, if Recon is configured in the Datanode and all the reports 
> that are to be sent to Recon will be pending in the StateContextQueue 
> (LinkedList)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-4404) Datanode can go OOM when a Recon or SCM Server is very slow in processing reports.

Reply via email to