[
https://issues.apache.org/jira/browse/HDFS-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030676#comment-16030676
]
Weiwei Yang commented on HDFS-11886:
------------------------------------
Thanks [~vagarychen] for raising up this problem and [~anu] for the design doc.
Let me know if I understand this correctly. The proposal adds a
*ksm-keys-under-progress.db* in KSM, only if all the steps finish successfully,
a key is moved from *ksm-keys-under-progress.db* to *ksm.db*. This introduces
more times of writes to disk
# put key to inprogress db -> add key
# delete key in inprogress db -> commit key 1
# add key to ksm db -> commit key 2
do we really need to persist this? Can we store the state in memory only? Only
if all succeed, commit this to *ksm.db*, otherwise dispose it. If KSM crashed
before a key is committed, that key won't be written to KSM namespace because
that cache after KSM restart will be gone. This is like a write-cache in front
of ksm.db.
Another question: why we need to return a flag to OzoneHandler to determine if
a container needs to be created? I am wondering why we need these additional
RPC calls, why not let SCM creates the container on datanodes if necessary and
simply return client an open container.
Thanks.
> Ozone : improve error handling for putkey operation
> ---------------------------------------------------
>
> Key: HDFS-11886
> URL: https://issues.apache.org/jira/browse/HDFS-11886
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Reporter: Chen Liang
> Attachments: design-notes-putkey.pdf
>
>
> Ozone's putKey operations involve a couple steps:
> 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore
> 2. allocatedBlock gets returned to client, client checks to see if container
> needs to be created on datanode, if yes, create the container
> 3. writes the data to container.
> it is possible that 1 succeeded, but 2 or 3 failed, in this case there will
> be an entry in KSM's local metastore, but the key is actually nowhere to be
> found. We need to revert 1 is 2 or 3 failed in this case.
> To resolve this, we need at least two things to be implemented first.
> 1. We need deleteKey() to be added KSM first.
> 2. We also need container reports to be implemented first such that SCM can
> track whether the container is actually added.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]