On 09/05/2011 05:42 PM, Sofia Georgiakaki wrote:
Good evening,

this topic seems very interesting. To be sure I understood the case -
do you mean that I can write a simple Java program and access a file
stored in HDFS from within the java application?

Assuming that I have e.g. 10 files of size 30GB each stored on HDFS
on a cluster of 15 nodes, how can I run a java program that accesses
these files and reads some blocks from them? Is it possible to do it
without copying the files via -copyToLocal ?

If yes, could anyone give some general directions on the general form
of such a java code, and on how to run such a program?

Thank  you in advance Sofia

You certainly can access a file on HDFS through a simple Java program. You can also access your files with an even simpler Python program using the Pydoop HDFS module (http://pydoop.sf.net/). Here's a simple Python script to print a file:


import pydoop.hdfs as py_hdfs

fs = py_hdfs.hdfs('default', 0)

for line in fs.open_file("myfile", 'r'):
        print line



--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452

Reply via email to