[
https://issues.apache.org/jira/browse/HDDS-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733351#comment-17733351
]
ChenXi commented on HDDS-8865:
------------------------------
After some analysis, we found some possible performance bottlenecks.
- SCM sending mechanism: When SCM sends a delete transaction TX1 to DN1, when
DN1 cannot tell SCM (via heartbeat) that TX1 has completed before SCM next
creates a transaction, SCM1 will send TX1 to DN1 again. But the second TX1 is
invalid.
- DN Delete command processing: We found that
`DeleteBlocksCommandHandlerThread` often gets stuck because it waits for a long
time for the Container lock, when a large number of transactions can be
observed waiting in the thread's queue. All subsequent transactions will be
blocked. Also, due to SCM's sending mechanism, it can be observed that SCM will
only send delete transactions to the stuck DN, and other DNs that may be normal
will not receive new things.
> Ozone asynchronous deletion performance optimization
> ----------------------------------------------------
>
> Key: HDDS-8865
> URL: https://issues.apache.org/jira/browse/HDDS-8865
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: ChenXi
> Assignee: ChenXi
> Priority: Major
>
> Background:
> We have a Cluster which will write many small key. The user will merge these
> small keys into one big key, so these small keys will not be stored for a
> long time. So we have almost the same QPS of delete requests as writes.
> The current key deletion performance cannot meet the requirements, the disk
> is using less and less available capacity, but in fact, only a small portion
> of the user's valid data
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]