Hadoop Streaming does not magically teach Python open() how to read from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs -cat" to read the file for you.
A few links that may help: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs https://bitbucket.org/turnaev/cyhdfs -andy On Sat, Jan 12, 2013 at 12:30 AM, springring <springr...@126.com> wrote: > Hi, > > When I run code below as a streaming, the job error N/A and killed. I > run step by step, find it error when > " file_obj = open(file) " . When I run same code outside of hadoop, > everything is ok. > > 1 #!/bin/env python > 2 > 3 import sys > 4 > 5 for line in sys.stdin: > 6 offset,filename = line.split("\t") > 7 file = "hdfs://user/hdfs/catalog3/" + filename > 8 print line > 9 print filename > 10 print file > 11 file_obj = open(file) > .................................. >