Liu Xiao created MAPREDUCE-6249:
-----------------------------------
Summary: Streaming task will not untar tgz uploaded with -archives
Key: MAPREDUCE-6249
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6249
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 2.5.2
Environment: hadoop-2.5.2
hadoop-streaming-2.5.2.jar
Reporter: Liu Xiao
when writing hadoop streaming task. i used -archives to upload a tgz from local
machine to hdfs task working directory, but it has not been untarred as the
document says. I've searched a lot without any luck.
Here is the hadoop streaming task starting command with hadoop-2.5.2
hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
-files mapper.sh
-archives /home/hadoop/tmp/test.tgz#test \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-input "/test/test.txt" \
-output "/res/" \
-mapper "sh mapper.sh" \
-reducer "cat"
and "mapper.sh"
cat > /dev/null
ls -l test
exit 0
in "test.tgz" there is two files "test.1.txt" and "test.2.txt"
echo "abcd" > test.1.txt
echo "efgh" > test.2.txt
tar zcvf test.tgz test.1.txt test.2.txt
the output from above task
lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test ->
/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz
but what desired may be like this
-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt
-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt
so, why test.tgz has not been untarred automatically as document says, and or
there is actually another way makes the "tgz" being untarred
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)