Answers in-line. Alex
On Thu, Apr 9, 2009 at 3:45 PM, Stas Oskin <[email protected]> wrote: > Hi. > > I have 2 questions about HDFS performance: > > 1) How fast are the read and write operations over network, in Mbps per > second? Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: < http://code.google.com/p/hypertable/wiki/KFSvsHDFS> > > > 2) If the chunk server is located on same host as the client, is there any > optimization in read operations? > For example, Kosmos FS describe the following functionality: > > "Localhost optimization: One copy of data > is placed on the chunkserver on the same > host as the client doing the write > > Helps reduce network traffic" In Hadoop-speak, we're interested in DataNodes (storage nodes) and TaskTrackers (compute nodes). In terms of MapReduce, Hadoop does try and schedule tasks such that the data being processed by a given task on a given machine is also on that machine. As for loading data onto a DataNode, loading data from a DataNode will put a replica on that node. However, if you're loading data from, say, your local machine, Hadoop will choose a DataNode at random. > > > Regards. >
