[ https://issues.apache.org/jira/browse/MAPREDUCE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved MAPREDUCE-3276. ----------------------------------------- Resolution: Later This issue is pretty stale at this point. Closing with later. if it is still a problem, then please open a new jira. > hadoop dfs -copyToLocal/copyFromLocal called within Hadoop Streaming returns > early > ---------------------------------------------------------------------------------- > > Key: MAPREDUCE-3276 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3276 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming > Affects Versions: 0.20.2 > Environment: Linux RedHat Enterprise Linux 5. > 31 node cluster with 1 as JobTracker and NameNode, and 30 as TaskTracker and > DataNode. > Reporter: Keith Stevens > Labels: hadoop, shell, streaming > > I'm using the Cloudera hadoop realease 0.20.2.+737 to parallelize bash > scripts with Hadoop Streaming. > Below is an example script that i've been running which simply copies a file > from hdfs to a local node. > {code:title=SampleMapper.sh|borderStyle=solid} > hadoop dfs -copyToLocal /path/to/some/large/file/myFile myFile > # Spin until the file is fully copied. > while [ ! -f myFile ] > do > echo "spin" > sleep 1 > done > {code} > Surprisingly, the copy call returns before the file is copied, if the file is > sufficiently large, and the while loop spins for several iterations. I'm > seeing similar behavior with copyFromLocal. > I've asked about this issue on other forms and no one else seems to have had > the problem, although I don't know how many peoplpe are attempting to do this > particular task. > Has this been fixed in more recent versions of hadoop? -- This message was sent by Atlassian JIRA (v6.3.4#6332)