[ 
https://issues.apache.org/jira/browse/HDFS-9909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226494#comment-15226494
 ] 

Bogdan Raducanu commented on HDFS-9909:
---------------------------------------

OK, thanks.
My first idea was to just signal this case through a special exception, e.g. 
ReplicaWaitingRecoveryException which is reported to the client app. Then, the 
client app can choose what to do. In my case, when I get this exception, I 
would call DFSClient.recoverLease to trigger the lease recovery and open again 
after this. That's what I do in the write case, before the soft limit expires.
In fact, as a workaround, I do this even for reading now, just now I get a 
generic exception. Just an idea. Maybe you find something better.

> Can't read file after hdfs restart
> ----------------------------------
>
>                 Key: HDFS-9909
>                 URL: https://issues.apache.org/jira/browse/HDFS-9909
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, namenode
>    Affects Versions: 2.7.1, 2.7.2
>            Reporter: Bogdan Raducanu
>            Assignee: Xiao Chen
>            Priority: Critical
>         Attachments: Main.java
>
>
> If HDFS is restarted while a file is open for writing then new clients can't 
> read that file until the hard lease limit expires and block recovery starts.
> Scenario:
> 1. write to file, call hflush
> 2. without closing the file, restart hdfs 
> 3. after hdfs is back up, opening file for reading from a new client fails 
> for 1 hour
> Repro attached.
> Thoughts:
> * possibly this also happens in other cases not just when hdfs is restarted 
> (e.g. only all datanodes in pipeline are restarted)
> * As far as I can tell this happens because the last block is RWR and 
> getReplicaVisibleLength returns -1 for this. The recovery starts after hard 
> lease limit expires (so file is readable only after 1 hour).
> * one can call recoverLease which will start the lease recovery sooner, BUT, 
> how can one know when to call this? The exception thrown is IOException which 
> can happen for other reasons.
> I think a reasonable solution would be to return a specialized exception 
> (similar to AlreadyBeingCreatedException when trying to write to open file).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to