[
https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076288#comment-16076288
]
Weiwei Yang commented on HDFS-11922:
------------------------------------
There seems to have more and more desires to have this feature ready. I just
think about this again. What could do might be add a few states to keys,
||State||Description||
|CREATING|A key is under creation, data not fully flushed to disk|
|CREATED|A key is created, it is visible in namespace and ready for read/write|
|MISSINGREPLICA|One or more replica is missing, but it still has at least 1
replica live available|
|CORRUPTED|All replicas are lost, the key is unable to read.|
|STALE|A key is no longer valid, it needs to be purged from namespace, its data
needs to be removed from datanodes.|
Deletes a key can be implemented like following
# KSM looks up the key from DB to gets the container where this key is stored
# KSM looks up SCM to get the pipeline where the container is replicated to
# KSM connects to each container server, updates the container metadata to mark
the key is "stale" (the actual delete is done in an async thread on each
datanode)
# KSM removes the key from its namespace (the key is no longer visible for
clients)
# Container server on each datanode scans the "stale" keys in each container,
and deletes the files at background. This can be done either in certain
interval, or optimally by listening to the key update events.
We just need to find a good place to store key - keystate or keystate - keys
mapping. This will help the putkey as well, to support the commit key phase.
Thoughts?
> Ozone: KSM: Garbage collect deleted blocks
> ------------------------------------------
>
> Key: HDFS-11922
> URL: https://issues.apache.org/jira/browse/HDFS-11922
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Reporter: Anu Engineer
> Priority: Critical
>
> We need to garbage collect deleted blocks from the Datanodes. There are two
> cases where we will have orphaned blocks. One is like the classical HDFS,
> where someone deletes a key and we need to delete the corresponding blocks.
> Another case, is when someone overwrites a key -- an overwrite can be treated
> as a delete and a new put -- that means that older blocks need to be GC-ed at
> some point of time.
> Couple of JIRAs has discussed this in one form or another -- so consolidating
> all those discussions in this JIRA.
> HDFS-11796 -- needs to fix this issue for some tests to pass
> HDFS-11780 -- changed the old overwriting behavior to not supporting this
> feature for time being.
> HDFS-11920 - Once again runs into this issue when user tries to put an
> existing key.
> HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on
> GC for Datanodes.
> When we solve this issue, we should also consider 2 more aspects.
> One, we support versioning in the buckets, tracking which blocks are really
> orphaned is something that KSM will do. So delete and overwrite at some point
> needs to decide how to handle versioning of buckets.
> Two, If a key exists in a closed container, then it is immutable, hence the
> strategy of removing the key might be more complex than just talking to an
> open container.
> cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo],
> [~szetszwo], [~nandakumar131]
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]