[ 
https://issues.apache.org/jira/browse/HDFS-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026534#comment-16026534
 ] 

Chen Liang commented on HDFS-11886:
-----------------------------------

Thanks [~anu] for looking at this! No decision has been made at all for this 
JIRA, any thoughts are more than welcome.

To make sure we are on the same page, did you mean maybe we can have the client 
send a "commit" message to KSM after the key is written to datanode, only then 
KSM writes that to ksm.db? 

If I understand this correctly, I think one thing with this way is that for any 
successful putKey, there will always be two calls to KSM guaranteed, one to 
allocate block, the other to commit the key. If putKey failed, there will be no 
commit and only the first call. While for the revert-failed-key approach, there 
is always one call to KSM for successful putKey (which is to allocate block), 
but two calls to KSM for failed putKey (revert the key). If assuming putKey is 
more likely to succeed then fail, this seems to me a +1 for revert-fail.

However, another thing, is how can we be sure a key is finalized after all. For 
the commit-success approach, seems easy: unless that success flag is set, the 
key is considered not ready (similar to under construction), but for 
revert-failure approach, there will be temporary window where a key actually 
failed, but before it is reverted, it has already been read by someone.  So 
this seems a +1 for commit-success approach.

In short, this probably comes down to do we favor less RPC calls? or do we 
favor reliable getKey at any time?

> Ozone : improve error handling for putkey operation
> ---------------------------------------------------
>
>                 Key: HDFS-11886
>                 URL: https://issues.apache.org/jira/browse/HDFS-11886
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Chen Liang
>
> Ozone's putKey operations involve a couple steps:
> 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore
> 2. allocatedBlock gets returned to client, client checks to see if container 
> needs to be created on datanode, if yes, create the container
> 3. writes the data to container.
> it is possible that 1 succeeded, but 2 or 3 failed, in this case there will 
> be an entry in KSM's local metastore, but the key is actually nowhere to be 
> found. We need to revert 1 is 2 or 3 failed in this case. 
> To resolve this, we need at least two things to be implemented first.
> 1. We need deleteKey() to be added KSM first. 
> 2. We also need container reports to be implemented first such that SCM can 
> track whether the container is actually added.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to