[ 
https://issues.apache.org/jira/browse/HDDS-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-8417:
-----------------------------------
    Fix Version/s:     (was: 1.4.0)

> Cap on queue of commands at DN
> ------------------------------
>
>                 Key: HDDS-8417
>                 URL: https://issues.apache.org/jira/browse/HDDS-8417
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Sumit Agrawal
>            Assignee: Sumit Agrawal
>            Priority: Minor
>              Labels: proton
>
> When command is operating very slowly, its observed that queue is getting 
> pilled up. In one of environment, below are queued up causing usages of 
> approx 5GB memory,
>  * Block delete command: 2.8k
>  * Close Container command: 13k
>  * Close pipeline command: 2390k
>  * replicate container command: 57k
> This is happening when disk is almost full and command is running very slow.
> SCM keeps sending same command every heartbeat.
> So a cap on queue is required and need reject further command accumulation. 
> This needs to be done for each command type queue based on memory occupancy 
> and repeating of command from SCM.
>  
> *Command size pattern:*
> DeleteBlockCommand: 1.7MB  (having 5804 containers)
> ClosePipelineCommand: 130 bytes
> Create Pipeline: 3.3KB  (DN Info: 1KB * 3 DN)
> Replicate Container: 1.2KB (DN Info 1KB)
> CloseContainer: 1.2KB (Encoded token 1KB)
> DeleteContainer: 100 bytes
>  
> *DeleteBlockCommand:* triggered by SCM at every 5 min default, and is 
> repeated again if some blocks response has not come. So this operation can be 
> retried again when ignored by DN or do not send.
> A special handling with max queue size: *5* should be enough.
>  
> *ClosePipeline/CloseContainer/DeleteContainer:* retried by SCM again for 
> every container/pipeline, so queue size: *5000* should be enough.
> *CreatePipeline/ReplicateContainer* is controlled by SCM which is not retried 
> / repeated, so can support 5000 as queue size
> *Others* can follow similar default size as *5000* as CAP.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to