Liu Xiao created MAPREDUCE-6249: ----------------------------------- Summary: Streaming task will not untar tgz uploaded with -archives Key: MAPREDUCE-6249 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 2.5.2 Environment: hadoop-2.5.2 hadoop-streaming-2.5.2.jar Reporter: Liu Xiao
when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck. Here is the hadoop streaming task starting command with hadoop-2.5.2 hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \ -files mapper.sh -archives /home/hadoop/tmp/test.tgz#test \ -D mapreduce.job.maps=1 \ -D mapreduce.job.reduces=1 \ -input "/test/test.txt" \ -output "/res/" \ -mapper "sh mapper.sh" \ -reducer "cat" and "mapper.sh" cat > /dev/null ls -l test exit 0 in "test.tgz" there is two files "test.1.txt" and "test.2.txt" echo "abcd" > test.1.txt echo "efgh" > test.2.txt tar zcvf test.tgz test.1.txt test.2.txt the output from above task lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz but what desired may be like this -rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt -rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the "tgz" being untarred -- This message was sent by Atlassian JIRA (v6.3.4#6332)