[ https://issues.apache.org/jira/browse/HADOOP-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang resolved HADOOP-8640. ------------------------------------- Resolution: Won't Fix Given that the refactor in HADOOP-12973 unintentionally eliminated this problem in 2.8.0 and above, I'll mark this as a won't fix. > DU thread transient failures propagate to callers > ------------------------------------------------- > > Key: HADOOP-8640 > URL: https://issues.apache.org/jira/browse/HADOOP-8640 > Project: Hadoop Common > Issue Type: Bug > Components: fs, io > Affects Versions: 2.0.0-alpha, 1.2.1 > Reporter: Todd Lipcon > Priority: Major > > When running some stress tests, I saw a failure where the DURefreshThread > failed due to the filesystem changing underneath it: > {code} > org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access > `/data/4/dfs/dn/current/BP-1928785663-172.20.90.20-1343880685858/current/rbw/blk_4637779214690837894': > No such file or directory > {code} > (the block was probably finalized while the du process was running, which > caused it to fail) > The next block write, then, called {{getUsed()}}, and the exception got > propagated causing the write to fail. Since it was a pseudo-distributed > cluster, the client was unable to pick a different node to write to and > failed. > The current behavior of propagating the exception to the next (and only the > next) caller doesn't seem well-thought-out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org