Hello! I have a 7 node cluster. But there is one remote node(8th machine) within the same LAN which holds some kind of data. Now, I need to place this data into HDFS. This 8th machine is not a part of the hadoop cluster(master/slave) config file.
So, what I have thought is:: -> Will get the Filesystem instance by using FileSystem api -> Will get the local file's(remote machine's) instance by using the same api by passing a different config file which simply states a tag of fs, default.name -> And then will simply use all the methods to copy and get the data back from HDFS... -> During the complete episode, I will have to take care of the proxy issues for remote node to get connceted to Namenode. Is this procedure correct? Also, I am an undergraduate as of now. I want to be a part of this hadoop project and get into its development of various sub projects undertaken. Can that be feasible.?? Thanking You, On Fri, Jun 5, 2009 at 11:19 PM, Alex Loddengaard <[email protected]> wrote: > Hi, > > The throughput of HDFS is good, because each read is basically a stream > from > several hard drives (each hard drive holds a different block of the file, > and these blocks are distributed across many machines). That said, HDFS > does not have very good latency, at least compared to local file systems. > > When you write a file using the HDFS client (whether it be Java or > bin/hadoop fs), the client and the name node coordinate to put your file on > various nodes in the cluster. When you use that same client to read data, > your client coordinates with the name node to get block locations for a > given file and does a HTTP GET request to fetch those blocks from the nodes > which store them. > > You could in theory get data off of the local file system on your data > nodes, but this wouldn't make any sense, because the client does everything > for you already. > > Hope this clears things up. > > Alex > > On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar > <[email protected]>wrote: > > > Hello! > > > > Placing any kind of data into HDFS and then getting it back, can this > > activity be fast? Also, the node of which I have to place the data in > HDFS, > > is a remote node. So then, will I have to use RPC mechnaism or simply cna > > get the locla filesystem of that node and do the things? > > > > -- > > Regards! > > Sugandha > > > -- Regards! Sugandha
