[ 
https://issues.apache.org/jira/browse/HADOOP-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi reassigned HADOOP-2647:
------------------------------------

    Assignee: Raghu Angadi

> dfs -put hangs
> --------------
>
>                 Key: HADOOP-2647
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2647
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>         Environment: LINUX
>            Reporter: lohit vijayarenu
>            Assignee: Raghu Angadi
>
> We saw a case where dfs -put hung while copying a 2GB file for over 20 hours.
> When we took a look at the stack trace of process the main thread was waiting 
> for confirmation from namenode for complete status.
> only 4 blocks were copied and 5th block was missing when we ran fsck on the 
> partially transfered file. 
> There are 2 problems we saw here.
> 1. DFS client hung without a timeout when there is no response from namenode.
> 2. In IOUtils::copyBytes(InputStream in, OutputStream out, int buffSize, 
> boolean close)
> During copy, if there is an exception, the out.close() is called. Exception 
> is not caught. Which is why we see a close call in the stack trace. 
> When we checked for block IDs in namenode log. For the block which was 
> missing, there was only one response to namenode instead of three.
> This close state coupled with namenode not reporting the error back might 
> have cause the whole process to hang.
> Opening this JIRA to see if we could add checks to the two problems mentioned 
> above.
> <stack trace of main thread>
> "main" prio=10 tid=0x0805a000 nid=0x5b53 waiting on condition 
> [0xf7e64000..0xf7e65288]   java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method) 
>   at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1751)  - 
> locked <0x77d593a0> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)  at 
> org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)  at 
> org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
>   at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
>   at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:114)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:1354)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:1472)
> </stack trace of main thread>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to