Re: HDFS read/write speeds, and read optimization

Stas Oskin Fri, 10 Apr 2009 07:38:03 -0700

Hi.

>
> Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: <
> http://code.google.com/p/hypertable/wiki/KFSvsHDFS>
>


>From this comparison it seems KFS is quite faster then HDFS for small data
transfers (for SQL commands).

Any idea if same holds true for small-medium (20Mb - 150 MB) files?


>
> >
> >
> > 2) If the chunk server is located on same host as the client, is there
> any
> > optimization in read operations?
> > For example, Kosmos FS describe the following functionality:
> >
> > "Localhost optimization: One copy of data
> > is placed on the chunkserver on the same
> > host as the client doing the write
> >
> > Helps reduce network traffic"
>
> In Hadoop-speak, we're interested in DataNodes (storage nodes) and
> TaskTrackers (compute nodes).  In terms of MapReduce, Hadoop does try and
> schedule tasks such that the data being processed by a given task on a
> given
> machine is also on that machine.  As for loading data onto a DataNode,
> loading data from a DataNode will put a replica on that node.  However, if
> you're loading data from, say, your local machine, Hadoop will choose a
> DataNode at random.
>

Ah, so if DataNode will store file to HDFS, it would try to place a replica
on this same DataNode as well? And then if this DataNode would try to read
the file. HDFS would try to read it first from itself first?

Regards.

Re: HDFS read/write speeds, and read optimization

Reply via email to