[ 
https://issues.apache.org/jira/browse/HDDS-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767569#comment-17767569
 ] 

Sammi Chen commented on HDDS-7728:
----------------------------------

{quote}SCM requires that all replicas are empty before moving the container to 
deleted state. One non-empty replica will block deletion of all replicas.
My point is that both cases can be solved by the missing/orphan container 
cleanup. If we are already implementing missing container cleanup then there is 
no need to add complexity to the RM to additionally handle the orphan block 
case as well.{quote}

I guess you mean the case that a container has 4 replicas, 3 are empty, 1 has 
blocks. This container can be identified by Recon as a orphan container to 
delete. This is one special case of orphan block. 

The majority is the following case, that a container has 4 live replicas, 
* Replica-1 10 blocks
* Replica-2 10 blocks
* Replica-3 10 blocks
* Replica-4 15 blocks 

Replica-4 has 15 blocks, the extra 5 blocks are orphan blocks.  And there is no 
pending block deletion txs for this container.  Which 5 blocks are orphan, 
currently no single module knows it.  Because DN 's container report doesn't 
include info of each block. And single DN also doesn't know which block is 
orphan as it only know the info of one replica of container. 
Since there are 4 replicas, RM will choose one to delete. IIRC, currently it 
picks the first one based on some hash sort result. In the above case, any 
replica can be a delete candidate. If it's lucky enough, Replica-4 is chosen 
and deleted, then both the over-replicated and orphan blocks are solved for 
this container.  If other replica is chosen and deleted, then over-replicated 
is solved but orphan blocks are still there. 
So the proposal is leverage deleteTransactionId in container replica info. For 
each container replica, there are two Ids. One is blockCommitSequenceId, bcsid, 
which will monotonically increase every time metadata is updated for a OPEN 
container. Another is deleteTransactionId, which is also a SCM wise globally 
monotonically increase number.  Once container is transformed from OPEN to 
CLOSED, bcsid will never change again. But we can delete blocks in a CLOSED 
container. Every time a new batch of blocks are deleted, the 
deleteTransactionId in this container will be updated. So the container replica 
which has the smaller deleteTransactionId will be one has orphan blocks than 
others. In this way, which replica to delete is deterministic, Replica-4 will 
be chosen.  Then the orphan block will be resolved naturally when Replica-4 is 
deleted. 


[~erose], I think we may have some communication gap here.  Let me summarize 
the cases of orphan container and orphan blocks in below. Forget about the 
title of JIRA, more cases than the one stated in the title will be discussed. 

1. orphan containers.  Containers are not referred anymore from OM metadata. 
Those containers, they may or may not has replicas reported fro DN.  For this 
type container, I agree,  that Recon is the best place to do a cleanup if there 
are replicas reported, for Recon has the OM data, RM in SCM doesn't.  RM cannot 
know which container is orphan or not.  For those orphan containers, it's 
related block deletion transactions in SCM, if any, can skip to execute and be 
deleted too.  Some missing containers can be both missing and orphan. 

2. missing containers. Containers are referred from OM, but don't have any 
replicas reported from DN.  It cause data loss, a sever problem to Ozone.  This 
type of containers may have pending block deletion txs too. It's better to keep 
this container metadata, block deletion txs, and other container related data 
untouched to have a context for further data loss investigation. 

3. Over-Replicated Containers neither orphan or missing, tow cases
  a.  There are no pending block deletion txs for those containers. The 
proposal of this case is already explained in the beginning part of this 
comments. 
  b.  There are pending block deletion txs for those containers.  It looks like 
the RM and Block deletion service doesn't have sync on this. RM can send out 
the replica deletion command to one DN. In the meanwhile, Block deletion 
services can send out block deletion transactions to four DNs. When 3 DN ack 
the txs success, SCM will delete the transaction from RocksDB. So the above 
sample container could end up as,
* Replica-1 10 blocks -> 6 blocks
* Replica-2 10 blocks. // deleted
* Replica-3 10 blocks -> 6 blocks
* Replica-4 15 blocks  -> 11 blocks
The 5 extra orphan blocks are still there. The key point here is Replica-4 is 
not the deleted one. 
So we can see, whether there is pending deletion txs for the over-replicated 
container, the key to resolve the orphan block is to chose the block replica 
with small deleteTransactionId to delete. 

4. Under-Replicated containers
RM will copy 1 replica to make it 3 replica. Which one replica is a better 
source? replica with bigger bcsid and bigger deleteTransactionId. 

5. Mis-Replicated containers
Mis-replicated containers is equal to a under-replicated case plus a 
over-replicated case. Follow the above item 3 and item 4 solution respectively. 

6. Unhealthy containers, all replicas are of unhealthy state. 
I not sure about how block deletion service handles this type of containers 
currently. Need check it more. 

All the time,  [~ashishk] and I proposing solution for above item 3, and you 
and Stephen are emphasizing item 1 and item 2,  
Stephen also mentioned item 4 and 5. I think that where the communication gap 
comes from.  I agree with Ethan's proposal about orphan container handled by 
Recon. For orphan blocks, the the special case can be covered by orphan 
container handling, while the majority case is better handled by RM in SCM.  
For Recon doesn't have any advantage over RM on this problem.  If required, we 
can have a sync meeting on this topic. What do you think? [~erose][~sodonnell]. 

 

> Block should be safely deleted from the containers if they are instructed 
> from OM and containers are in missing state.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-7728
>                 URL: https://issues.apache.org/jira/browse/HDDS-7728
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 1.3.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Ashish Kumar
>            Priority: Major
>
> Currently when OM instructs to delete the blocks and if containers are in 
> missing state, deletion may not be processed properly. This Jira to track 
> this requirement and implement to safe deletion os blocks what ever state 
> they are on. Otherwise containers would never get cleaned up even though all 
> blocks in that files deleted. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to