Good evening, this topic seems very interesting. To be sure I understood the case - do you mean that I can write a simple Java program and access a file stored in HDFS from within the java application?
Assuming that I have e.g. 10 files of size 30GB each stored on HDFS on a cluster of 15 nodes, how can I run a java program that accesses these files and reads some blocks from them? Is it possible to do it without copying the files via -copyToLocal ? If yes, could anyone give some general directions on the general form of such a java code, and on how to run such a program? Thank you in advance Sofia ________________________________ From: Uma Maheswara Rao G 72686 <[email protected]> To: [email protected] Sent: Monday, September 5, 2011 6:04 PM Subject: Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster? Hi, It is very much possible. Infact that is the main use case for Hadoop :-) You need to put the hadoop-hdfs*.jar hdoop-common*.jar's in your class path from where you want to run the client program. At client node side use the below sample code Configuration conf=new Configuration(); //you can set the required configurations here FileSystem fs =new DistributedFileSystem(); fs.initialize(new URI(<Name_Node_URL>), conf); fs.copyToLocal(srcPath, destPath) fs.copyFromLocal(srcPath,destPath) .....etc There are many API exposed in FileSystem.java class. So, you can make use of them. Regards, Uma ----- Original Message ----- From: Ralf Heyde <[email protected]> Date: Monday, September 5, 2011 7:59 pm Subject: Is it possible to access the HDFS via Java OUTSIDE the Cluster? To: [email protected] > Hello, > > > > I have found a HDFSClient which shows me, how to access my HDFS > from inside > the cluster (i.e. running on a Node). > > > > My Idea is, that different processes may write 64M Chunks to HDFS from > external Sources/Clients. > > Is that possible? > > How that can be done? Does anybody have some Example Code? > > > > Thanks, > > > > Ralf > > > >
