[
https://issues.apache.org/jira/browse/HADOOP-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564803#action_12564803
]
Raghu Angadi commented on HADOOP-2647:
--------------------------------------
> My vote would be to do nothing on 0.16.
+1.
We can close this jira. Error message etc, could be changed later as part of
some other jira. I think there are no plans to fix this for 0.15.
> dfs -put hangs
> --------------
>
> Key: HADOOP-2647
> URL: https://issues.apache.org/jira/browse/HADOOP-2647
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.15.1
> Environment: LINUX
> Reporter: lohit vijayarenu
> Assignee: Raghu Angadi
> Fix For: 0.16.1
>
> Attachments: HADOOP-2647.patch
>
>
> We saw a case where dfs -put hung while copying a 2GB file for over 20 hours.
> When we took a look at the stack trace of process the main thread was waiting
> for confirmation from namenode for complete status.
> only 4 blocks were copied and 5th block was missing when we ran fsck on the
> partially transfered file.
> There are 2 problems we saw here.
> 1. DFS client hung without a timeout when there is no response from namenode.
> 2. In IOUtils::copyBytes(InputStream in, OutputStream out, int buffSize,
> boolean close)
> During copy, if there is an exception, the out.close() is called. Exception
> is not caught. Which is why we see a close call in the stack trace.
> When we checked for block IDs in namenode log. For the block which was
> missing, there was only one response to namenode instead of three.
> This close state coupled with namenode not reporting the error back might
> have cause the whole process to hang.
> Opening this JIRA to see if we could add checks to the two problems mentioned
> above.
> <stack trace of main thread>
> "main" prio=10 tid=0x0805a000 nid=0x5b53 waiting on condition
> [0xf7e64000..0xf7e65288] java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1751) -
> locked <0x77d593a0> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream) at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64) at
> org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83) at
> org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
> at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
> at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:114)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:1354)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:1472)
> </stack trace of main thread>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.