Hi. > > Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: < > http://code.google.com/p/hypertable/wiki/KFSvsHDFS> >
>From this comparison it seems KFS is quite faster then HDFS for small data transfers (for SQL commands). Any idea if same holds true for small-medium (20Mb - 150 MB) files? > > > > > > > 2) If the chunk server is located on same host as the client, is there > any > > optimization in read operations? > > For example, Kosmos FS describe the following functionality: > > > > "Localhost optimization: One copy of data > > is placed on the chunkserver on the same > > host as the client doing the write > > > > Helps reduce network traffic" > > In Hadoop-speak, we're interested in DataNodes (storage nodes) and > TaskTrackers (compute nodes). In terms of MapReduce, Hadoop does try and > schedule tasks such that the data being processed by a given task on a > given > machine is also on that machine. As for loading data onto a DataNode, > loading data from a DataNode will put a replica on that node. However, if > you're loading data from, say, your local machine, Hadoop will choose a > DataNode at random. > Ah, so if DataNode will store file to HDFS, it would try to place a replica on this same DataNode as well? And then if this DataNode would try to read the file. HDFS would try to read it first from itself first? Regards.
