On 09/05/2011 05:42 PM, Sofia Georgiakaki wrote:
Good evening,
this topic seems very interesting. To be sure I understood the case -
do you mean that I can write a simple Java program and access a file
stored in HDFS from within the java application?
Assuming that I have e.g. 10 files of size 30GB each stored on HDFS
on a cluster of 15 nodes, how can I run a java program that accesses
these files and reads some blocks from them? Is it possible to do it
without copying the files via -copyToLocal ?
If yes, could anyone give some general directions on the general form
of such a java code, and on how to run such a program?
Thank you in advance Sofia
You certainly can access a file on HDFS through a simple Java program.
You can also access your files with an even simpler Python program using
the Pydoop HDFS module (http://pydoop.sf.net/). Here's a simple Python
script to print a file:
import pydoop.hdfs as py_hdfs
fs = py_hdfs.hdfs('default', 0)
for line in fs.open_file("myfile", 'r'):
print line
--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel: +39 0709250452