[jira] [Resolved] (HDDS-11430) [QUESTION] A bunch of irrelevant pipelines on datanode

Nandakumar (Jira) Wed, 11 Sep 2024 22:26:03 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nandakumar resolved HDDS-11430.
-------------------------------
    Resolution: Invalid

Github Discussion created to continue the discussion.
https://github.com/apache/ozone/discussions/7186


> [QUESTION] A bunch of irrelevant pipelines on datanode
> ------------------------------------------------------
>
>                 Key: HDDS-11430
>                 URL: https://issues.apache.org/jira/browse/HDDS-11430
>             Project: Apache Ozone
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Vyacheslav Tutrinov
>            Assignee: Nandakumar
>            Priority: Critical
>         Attachments: Screenshot 2024-09-04 at 19.22.13.png
>
>
> There was a following case:
>  * 9 datanodes
>  * a bucket with RATIS/THREE replication configuration
>  * around 6Tb total disk space
>  * a bunch of write operations applied and SCM started to respond with error 
> that we haven't enough datanodes to allocate new pipeline
>  * datanodes's logs contains entries about "volume space is not enough"
>  * the cluster was living in that state some time (2-3 hours) and clients 
> were continuing their attempts to write keys
>  * at a certain moment the cluster was rebooted and 6 datanodes not entered 
> to HEALTHY state
>  * heap dump was fetched from one of that datanodes - a lot of 
> SegmentedRaftLog objects was detected
> !Screenshot 2024-09-04 at 19.22.13.png|width=845,height=630!
>  *  these are the pipeline objects that were trying to be created earlier
>  * the datanodes were not started correctly until the 
> dfs.container.ratis.log.appender.queue.byte-limit configuration property was 
> decreased to 100KB (the default value is 32MB)
>  * without reducing the conf property value, data nodes would reach a too 
> large heap space state
>  * eventually, the datanodes removed unnecessary pipelines and entered a 
> HEALTHY state
> h3. QUESTION
> Why did SCM try to create pipelines on full datanodes and how to avoid this?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HDDS-11430) [QUESTION] A bunch of irrelevant pipelines on datanode

Reply via email to