[
https://issues.apache.org/jira/browse/HDDS-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767605#comment-17767605
]
Stephen O'Donnell commented on HDDS-7728:
-----------------------------------------
I suspect there are various other scenarios where orphan blocks can appear in
datanode containers when writes fail or a client gets killed, or with EC a bad
final stripe on a block.
If there are other such scenarios, then having RM delete the replica with the
largest delete transaction ID does not solve the problem completely, and
another recon based solution is needed anyway.
It is also not as simple to add this logic in RM as you may thing, as RM needs
to balance other things:
1. For replication, the source is currently picked as the least loaded DN, not
a random one. This is integral to the throttling design.
2. For replica delete, we need to consider the placement policy. Then we may
want to consider removing the replica on a DN with the least free space.
3. Over replication handling also kicks in around the balancer, where it
decides which replica to copy and then which to remove, so you need to consider
that too.
Then a natural extension of the problem is whether to check the delete
transaction of normally replicated containers to see if the delete transaction
is behind in some of them and if it is, treat the replica as under replicated
and start the process of making new copies and removing the bad one.
All these rules and complexity add up, and if it was to completely solve the
problem for all orphan block scenarios then it might be a good idea, but I am
not convinced it does. If there are other ways orphan blocks can creep in, then
we need another solution anyway. If that is the case, then we are better to
avoid adding all these rules to RM and implement the overall solution in a
single place.
> Block should be safely deleted from the containers if they are instructed
> from OM and containers are in missing state.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-7728
> URL: https://issues.apache.org/jira/browse/HDDS-7728
> Project: Apache Ozone
> Issue Type: Improvement
> Components: SCM
> Affects Versions: 1.3.0
> Reporter: Uma Maheswara Rao G
> Assignee: Ashish Kumar
> Priority: Major
>
> Currently when OM instructs to delete the blocks and if containers are in
> missing state, deletion may not be processed properly. This Jira to track
> this requirement and implement to safe deletion os blocks what ever state
> they are on. Otherwise containers would never get cleaned up even though all
> blocks in that files deleted.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]