Oh, another link I should have included! http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
-andy On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <a...@cloudera.com> wrote: > Hadoop Streaming does not magically teach Python open() how to read > from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs > -cat" to read the file for you. > > A few links that may help: > > http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ > http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs > https://bitbucket.org/turnaev/cyhdfs > > -andy > > On Sat, Jan 12, 2013 at 12:30 AM, springring <springr...@126.com> wrote: >> Hi, >> >> When I run code below as a streaming, the job error N/A and killed. I >> run step by step, find it error when >> " file_obj = open(file) " . When I run same code outside of hadoop, >> everything is ok. >> >> 1 #!/bin/env python >> 2 >> 3 import sys >> 4 >> 5 for line in sys.stdin: >> 6 offset,filename = line.split("\t") >> 7 file = "hdfs://user/hdfs/catalog3/" + filename >> 8 print line >> 9 print filename >> 10 print file >> 11 file_obj = open(file) >> .................................. >>