[jira] [Commented] (HADOOP-8640) DU thread transient failures propagate to callers

Wei-Chiu Chuang (JIRA) Thu, 19 May 2016 15:00:09 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292227#comment-15292227
 ]


Wei-Chiu Chuang commented on HADOOP-8640:
-----------------------------------------

HADOOP-12973 unintentionally fixed this bug.

> DU thread transient failures propagate to callers
> -------------------------------------------------
>
>                 Key: HADOOP-8640
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8640
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, io
>    Affects Versions: 2.0.0-alpha, 1.2.1
>            Reporter: Todd Lipcon
>
> When running some stress tests, I saw a failure where the DURefreshThread 
> failed due to the filesystem changing underneath it:
> {code}
> org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access 
> `/data/4/dfs/dn/current/BP-1928785663-172.20.90.20-1343880685858/current/rbw/blk_4637779214690837894':
>  No such file or directory
> {code}
> (the block was probably finalized while the du process was running, which 
> caused it to fail)
> The next block write, then, called {{getUsed()}}, and the exception got 
> propagated causing the write to fail. Since it was a pseudo-distributed 
> cluster, the client was unable to pick a different node to write to and 
> failed.
> The current behavior of propagating the exception to the next (and only the 
> next) caller doesn't seem well-thought-out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-8640) DU thread transient failures propagate to callers

Reply via email to