[GitHub] [ozone] errose28 commented on pull request #3360: HDDS-6663. remove container reference from scm if its state is DELETED

GitBox Tue, 08 Nov 2022 17:23:28 -0800


errose28 commented on PR #3360:
URL: https://github.com/apache/ozone/pull/3360#issuecomment-1308066256


   Hi @JacksonYao287 thanks for continuing the work on this. I do still have 
some concerns with this proposal though.
   
   > first, in this patch, DELETED container is removed from scm by 
default(this can be configured). and for those unknown container , no matter it 
is empty or not, we will deleted it by default(this can be configured).
   
   This doesn't address my [original 
point](https://github.com/apache/ozone/pull/3360#issuecomment-1287340191) that 
these two configs conflict for some values. I also do not think that config 
keys should be used to fragment core internal operations like deletion. Ozone 
only needs one delete path and we should be confident that the one we choose 
works. If we are not confident in this delete container proposal and are trying 
to hide it behind a config then that is a sign we should not go forward with 
the change.
   
   > although Deleting a non-empty unknown container by default is an 
aggressive operation, i think it makes sense here. in development environment, 
we can set the default operation to log WARN, so it give us a chance to 
investigate the problem.
   
   This assumes the issue shows up in dev first, which may not be the case. 
There's so many variables in an Ozone deployment it is possible that the dev 
environments work fine, but the perfect storm of issues does not happen until 
some production deployment. Having debugged multiple dev and prod Ozone 
deployments I can confirm this does happen for corner case bugs. This is why I 
have been advocating for a more cautious approach.
   
   > my idea here is after container is closed, only deleteblockLog can shrink 
the container size, not calculateUsage in containerReport handler. the size of 
container is reduced only after the deleteTransaction is committed in 
deleteblockLog. so even if a container replica has some orphan blocks and a 
bigger size, scm has a definite size of the container. when the size of a 
container in scm is shrinked to zero, scm will delete all the replicas of the 
container.
   
   We should use block count to determine if the container is empty, as used 
bytes is not exact like you mentioned. Block count is currently used by SCM to 
determine if a container is empty. There are some invariants that I believe 
make it impossible to solve this problem without using Recon:
   - Deleted block log entries must eventually be deleted.
   - SCM cannot know all possible replicas at a given time. An older one could 
be resurrected at any time.
   - SCM's delete block API used by OM must be idempotent.
   
   Given these properties I believe we cannot know the definite size of the 
container once blocks start getting deleted. We can only know the current size 
of each replica. For example, say a block is deleted from all known container 
replicas and SCM's "definite" block count decreases by 1. SCM removes the entry 
from the block log. Now maybe OM missed the ack for that delete so it resends 
it. SCM would repeat the same steps, incorrectly decrementing the "definite" 
block count again. It has no permanent record of which blocks were already 
deleted. Since old datanodes can be resurrected, they are also not a source of 
truth for which block deletions have already been issued by SCM.
   
   In summary, I feel that my 
[proposal](https://github.com/apache/ozone/pull/3360#issuecomment-1297759990) 
has the following advantages:
   
   - No config changes
   Existing config keys remain with their original defaults and functionality. 
The only difference is that `hdds.scm.unknown-container.action` now only 
applies to unknown *and empty* containers. No new configs need to be added. No 
need to define behavior for unintuitive configuration combinations.
   
   - Only confirmed empty containers are deleted.
   This minimizes the risk of data loss due to current code bugs or future 
regressions that could for any reason cause SCM to "forget" a container, or 
datanodes to misreport a container.
   
   The disadvantage of my proposal is that for the [block deletion corner 
case](https://github.com/apache/ozone/pull/3360#issuecomment-1119204766) 
discussed earlier, containers with those blocks will remain in the system. 
However this is true even today with the current container delete flow, since 
all replicas must have zero block count for deletion to happen. The orphan 
block problem is a block problem, not a container problem. IMO this means it 
should not be solved as an extension of container deletion by deleting 
non-empty containers. I think orphan block cleanup with Recon is the correct 
answer to this, even if it may not be implemented in the near future.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] errose28 commented on pull request #3360: HDDS-6663. remove container reference from scm if its state is DELETED

Reply via email to