[
https://issues.apache.org/jira/browse/HDDS-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sumit Agrawal updated HDDS-8417:
--------------------------------
Description:
When command is operating very slowly, its observed that queue is getting
pilled up. In one of environment, below are queued up causing usages of approx
5GB memory,
* Block delete command: 2.8k
* Close Container command: 13k
* Close pipeline command: 2390k
* replicate container command: 57k
This is happening when disk is almost full and command is running very slow.
SCM keeps sending same command every heartbeat.
So a cap on queue is required and need reject further command accumulation.
This needs to be done for each command type queue based on memory occupancy and
repeating of command from SCM.
*Command size pattern:*
DeleteBlockCommand: 1.7MB (having 5804 containers)
ClosePipelineCommand: 130 bytes
Create Pipeline: 3.3KB (DN Info: 1KB * 3 DN)
Replicate Container: 1.2KB (DN Info 1KB)
CloseContainer: 1.2KB (Encoded token 1KB)
DeleteContainer: 100 bytes
*DeleteBlockCommand:* triggered by SCM at every 5 min default, and is repeated
again if some blocks response has not come. So this operation can be retried
again when ignored by DN or do not send.
A special handling with max queue size: *5* should be enough.
*ClosePipeline/CloseContainer/DeleteContainer:* retried by SCM again for every
container/pipeline, so queue size: *5000* should be enough.
*CreatePipeline/ReplicateContainer* is controlled by SCM which is not retried /
repeated, so can support 5000 as queue size
*Others* can follow similar default size as *5000* as CAP.
was:
When command is operating very slowly, its observed that queue is getting
pilled up. In one of environment, below are queued up causing usages of approx
5GB memory,
* Block delete command: 2.8k
* Close Container command: 13k
* Close pipeline command: 2390k
* replicate container command: 57k
This is happening when disk is almost full and command is running very slow.
SCM keeps sending same command every heartbeat.
So a cap on queue is required and need reject further command accumulation.
This needs to be done for each command type queue based on memory occupancy and
repeating of command from SCM.
> Cap on queue of commands at DN
> ------------------------------
>
> Key: HDDS-8417
> URL: https://issues.apache.org/jira/browse/HDDS-8417
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Sumit Agrawal
> Assignee: Sumit Agrawal
> Priority: Minor
> Labels: proton
> Fix For: 1.4.0
>
>
> When command is operating very slowly, its observed that queue is getting
> pilled up. In one of environment, below are queued up causing usages of
> approx 5GB memory,
> * Block delete command: 2.8k
> * Close Container command: 13k
> * Close pipeline command: 2390k
> * replicate container command: 57k
> This is happening when disk is almost full and command is running very slow.
> SCM keeps sending same command every heartbeat.
> So a cap on queue is required and need reject further command accumulation.
> This needs to be done for each command type queue based on memory occupancy
> and repeating of command from SCM.
>
> *Command size pattern:*
> DeleteBlockCommand: 1.7MB (having 5804 containers)
> ClosePipelineCommand: 130 bytes
> Create Pipeline: 3.3KB (DN Info: 1KB * 3 DN)
> Replicate Container: 1.2KB (DN Info 1KB)
> CloseContainer: 1.2KB (Encoded token 1KB)
> DeleteContainer: 100 bytes
>
> *DeleteBlockCommand:* triggered by SCM at every 5 min default, and is
> repeated again if some blocks response has not come. So this operation can be
> retried again when ignored by DN or do not send.
> A special handling with max queue size: *5* should be enough.
>
> *ClosePipeline/CloseContainer/DeleteContainer:* retried by SCM again for
> every container/pipeline, so queue size: *5000* should be enough.
> *CreatePipeline/ReplicateContainer* is controlled by SCM which is not retried
> / repeated, so can support 5000 as queue size
> *Others* can follow similar default size as *5000* as CAP.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]