[
https://issues.apache.org/jira/browse/HDDS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyao Meng updated HDDS-4404:
-----------------------------
Target Version/s: 1.1.0
> Datanode can go OOM when a Recon or SCM Server is very slow in processing
> reports.
> ----------------------------------------------------------------------------------
>
> Key: HDDS-4404
> URL: https://issues.apache.org/jira/browse/HDDS-4404
> Project: Hadoop Distributed Data Store
> Issue Type: Task
> Components: Ozone Datanode
> Affects Versions: 1.0.0
> Reporter: Aravindan Vijayan
> Assignee: Siyao Meng
> Priority: Critical
> Attachments: Screen Shot 2020-10-26 at 11.24.09 PM.png
>
>
> From [~nanda619]'s analysis.
> ContainerReportPublisher thread runs periodically (default interval 60s) in
> Datanode and adds ContainerReport to StateContext (Queue).
> Heartbeat thread runs periodically (default interval 30s), picks up the
> ContainerReport (if any) from StateContext.
> For short time, the ContainerReport will be held in Datanode StateContext.
> For Recon, a change was made in datanode such that the ContainerReport will
> be cached in Datanode StateContext separately for each endpoint (i.e. SCM and
> Recon). As I see, if Recon is configured in the Datanode and all the reports
> that are to be sent to Recon will be pending in the StateContextQueue
> (LinkedList)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]