[ 
https://issues.apache.org/jira/browse/HDFS-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421175#comment-13421175
 ] 

Uma Maheswara Rao G commented on HDFS-3701:
-------------------------------------------

Yes, that should help us in almost solving this problem.
In our internal branch(based on branch-1), we were re-throwing the exception 
after trying for all the nodes. 

{code}
To be clear, this issue is about data loss.
{code}
Yes, Stack. This I have seen in my clusters. We solved it by adding above 
proposed code and HDFS-3222. 

( That time I concentrated to fix HDFS-3222 only on branch-2. But I should have 
proposed the changes for branch-1 as well :( . See the effect versions marked 
in HDFS-3222 ). One small gap I have seen in branch-1 is, bytes acked not 
tracked properly compared to hadoop-2 today. so, if we read the length from 
some other node which is having lesser length than primary node, and primary 
node connect back just before starting the actual read request. That time, 
still this kind of problems will be there. I have seen other JIRA, that 'we 
have to mark that failed node into dead node list when we get the rpc errors 
while fetching the length' should help in solving that issue.
 Have not seen so far after that fix in our internal branch.

So, I am +1 for doing that.

@Nicolas, do you have patch ready for branch-1? if no, I will generate the 
patch on branch-1 in some time next week.


                
> HDFS may miss the final block when reading a file opened for writing if one 
> of the datanode is dead
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3701
>                 URL: https://issues.apache.org/jira/browse/HDFS-3701
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 1.0.3
>            Reporter: nkeywal
>            Priority: Critical
>
> When the file is opened for writing, the DFSClient calls one of the datanode 
> owning the last block to get its size. If this datanode is dead, the socket 
> exception is shallowed and the size of this last block is equals to zero. 
> This seems to be fixed on trunk, but I didn't find a related Jira. On 1.0.3, 
> it's not fixed. It's on the same area as HDFS-1950 or HDFS-3222.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to