Sumit Agrawal created HDDS-7244:
-----------------------------------
Summary: Fix multiple reports queued up from same DN and using up
heap
Key: HDDS-7244
URL: https://issues.apache.org/jira/browse/HDDS-7244
Project: Apache Ozone
Issue Type: Improvement
Environment: Environment Observation:
# Disk is not SSD
# DNs- 96, Containers for each DNs on average approx 12K - 15K
# Hearbeat initially is 60sec, but same observed with 20 minutes also.
# Observed multiple threads waiting on write lock in thread dump, but not a
deadlock.
Reporter: Sumit Agrawal
Assignee: Sumit Agrawal
When processing of FCR/ICR reports from datanode is slow, queue keeps getting
filled with these reports. Slowness can be due to:
* Disk being slow (recommended SSD is not configured)
* Number of datanodes and container is huge
* Heartbeat from DNs configured with high frequency
* Thread contention / starvation where global locks are present
The issue is observed in combination of multiple condition as given above and
reports keeps on appending to EventQueue for processing. This further results
in heapdump over a period of time.
This is observed for both SCM and Recon server
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]