Hello!

I have a 7 node cluster. But there is one remote node(8th machine) within
the same LAN which holds some kind of data. Now, I need to place this data
into HDFS. This 8th machine is not a part of the hadoop
cluster(master/slave) config file.

So, what I have thought is::
-> Will get the Filesystem instance by using FileSystem api
-> Will get the local file's(remote machine's) instance by using the same
api by passing a different config file which simply states a tag of fs,
default.name

-> And then will simply use all the methods to copy and get the data back
from HDFS...
-> During the complete episode, I will have to take care of the proxy issues
for remote node to get connceted to Namenode.

Is this procedure correct?

Also, I am an undergraduate as of now. I want to be a part of this hadoop
project and get into its development of various sub projects undertaken. Can
that be feasible.??

Thanking You,


On Fri, Jun 5, 2009 at 11:19 PM, Alex Loddengaard <[email protected]> wrote:

> Hi,
>
> The throughput of HDFS is good, because each read is basically a stream
> from
> several hard drives (each hard drive holds a different block of the file,
> and these blocks are distributed across many machines).  That said, HDFS
> does not have very good latency, at least compared to local file systems.
>
> When you write a file using the HDFS client (whether it be Java or
> bin/hadoop fs), the client and the name node coordinate to put your file on
> various nodes in the cluster.  When you use that same client to read data,
> your client coordinates with the name node to get block locations for a
> given file and does a HTTP GET request to fetch those blocks from the nodes
> which store them.
>
> You could in theory get data off of the local file system on your data
> nodes, but this wouldn't make any sense, because the client does everything
> for you already.
>
> Hope this clears things up.
>
> Alex
>
> On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar
> <[email protected]>wrote:
>
> > Hello!
> >
> > Placing any kind of data into HDFS and then getting it back, can this
> > activity be fast? Also, the node of which I have to place the data in
> HDFS,
> > is a remote node. So then, will I have to use RPC mechnaism or simply cna
> > get the locla filesystem of that node and do the things?
> >
> > --
> > Regards!
> > Sugandha
> >
>



-- 
Regards!
Sugandha

Reply via email to