Oh, another link I should have included!
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/

-andy

On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <a...@cloudera.com> wrote:
> Hadoop Streaming does not magically teach Python open() how to read
> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
> -cat" to read the file for you.
>
> A few links that may help:
>
> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
> https://bitbucket.org/turnaev/cyhdfs
>
> -andy
>
> On Sat, Jan 12, 2013 at 12:30 AM, springring <springr...@126.com> wrote:
>> Hi,
>>
>>      When I run code below as a streaming, the job error N/A and killed.  I 
>> run step by step, find it error when
>> " file_obj = open(file) " .  When I run same code outside of hadoop, 
>> everything is ok.
>>
>>   1 #!/bin/env python
>>   2
>>   3 import sys
>>   4
>>   5 for line in sys.stdin:
>>   6     offset,filename = line.split("\t")
>>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>>   8     print line
>>   9     print filename
>>  10     print file
>>  11     file_obj = open(file)
>> ..................................
>>

Reply via email to