Hi.

>
> Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: <
> http://code.google.com/p/hypertable/wiki/KFSvsHDFS>
>

>From this comparison it seems KFS is quite faster then HDFS for small data
transfers (for SQL commands).

Any idea if same holds true for small-medium (20Mb - 150 MB) files?


>
> >
> >
> > 2) If the chunk server is located on same host as the client, is there
> any
> > optimization in read operations?
> > For example, Kosmos FS describe the following functionality:
> >
> > "Localhost optimization: One copy of data
> > is placed on the chunkserver on the same
> > host as the client doing the write
> >
> > Helps reduce network traffic"
>
> In Hadoop-speak, we're interested in DataNodes (storage nodes) and
> TaskTrackers (compute nodes).  In terms of MapReduce, Hadoop does try and
> schedule tasks such that the data being processed by a given task on a
> given
> machine is also on that machine.  As for loading data onto a DataNode,
> loading data from a DataNode will put a replica on that node.  However, if
> you're loading data from, say, your local machine, Hadoop will choose a
> DataNode at random.
>

Ah, so if DataNode will store file to HDFS, it would try to place a replica
on this same DataNode as well? And then if this DataNode would try to read
the file. HDFS would try to read it first from itself first?

Regards.

Reply via email to