Hadoop Streaming does not magically teach Python open() how to read
from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
-cat" to read the file for you.

A few links that may help:

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
https://bitbucket.org/turnaev/cyhdfs

-andy

On Sat, Jan 12, 2013 at 12:30 AM, springring <springr...@126.com> wrote:
> Hi,
>
>      When I run code below as a streaming, the job error N/A and killed.  I 
> run step by step, find it error when
> " file_obj = open(file) " .  When I run same code outside of hadoop, 
> everything is ok.
>
>   1 #!/bin/env python
>   2
>   3 import sys
>   4
>   5 for line in sys.stdin:
>   6     offset,filename = line.split("\t")
>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>   8     print line
>   9     print filename
>  10     print file
>  11     file_obj = open(file)
> ..................................
>

Reply via email to