Re: short-circuiting HDFS reads

Jun Rao Mon, 12 Jan 2009 19:30:55 -0800

George,

Thanks for the comment.


Note that HDFS files are never updated in-place. This should make it easy
to deal with cache consistency.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

jun...@almaden.ibm.com
(408)927-1886 (phone)
(408)927-3215 (fax)


george.por...@sun.com wrote on 01/12/2009 10:11:08 AM:

> Hi Jun,
>
> Thanks for the pointer to clear the disk cache, as well as the
> suggestion for creating a DFS Client cache layer.  As for the double
> buffering overhead, I think that there is not going to be a large
> benefit to buffering in the DataNode, since the DataNode itself does not
> use the data in the buffer (it just forwards that data to HDFS
> clients).  With the ability to perform zero-copy I/O operations, it
> probably shouldn't buffer any data at all, since it could just
> sendfile() the data directly from the disk to the network client via
> DMA, rather than copying it from disk into its memory space, then from
> memory to the socket.  The downside of a DFS client cache is that it
> would need to be kept consistent, which would probably add a lot of
> complexity to the client.  It is an interesting idea, though, and I
> think we should keep thinking about it.
>
> Thanks,
> George
>
> Jun Rao wrote:
> >
> > Hi, George,
> >
> > I read the results in your JIRA. Very encouraging. It would be useful
> > to test the improvement on both cold and warm data (warm data likely
> > has larger improvment). There is a simple way to clear the file cache
> > on Linux (http://www.linuxinsight.com/proc_sys_vm_drop_caches.html).
> >
> > An alternative approach is to build an in-memory caching layer on top
> > of a DFS Client. The advantages are (1) less security issues; (2)
> > probably even better performance since checksum can be avoided once
> > the data is cached in memory; (3) the caching layer can be used
> > anywhere, no just nodes owning a block locally. The disadvantage is
> > that data is buffered twice in memory: once in the caching layer and
> > once in the OS file cache. One can probably limit the OS file cache
> > size (not sure if there is an easy way in Linux). What's your thought
> > on this?
> >
> > Jun
> >
> > george.por...@sun.com wrote on 01/08/2009 10:13:25 AM:
> >
> > > Hi Jun,
> > >
> > > The earlier responses to your email reference the JIRA that I opened
> > > about this issue.  Short-circuiting the primary HDFS datapath does
> > > improve throughput, and the amount depends on your workload (random
> > > reads especially).  Some initial experimental results are posted to
> > that
> > > JIRA.  A second advantage is that since the JVM hosting the HDFS
client
> > > is doing the reading, the O/S will satisfy future disk requests from
> > the
> > > cache, which isn't really possible when you read over the network
(even
> > > to another JVM on the same host).
> > >
> > > There are several real disadvantages, the largest of which include
> > 1) it
> > > adds a new datapath, and 2) bypasses various security and auditing
> > > features of HDFS.  I would certainly like to think through a more
clean
> > > interface for achieving this goal, especially since reading local
data
> > > should be the common case.  Any thoughts you might have would be
> > > appreciated.
> > >
> > > Thanks,
> > > George
> > >
> > > Jun Rao wrote:
> > > > Hi,
> > > >
> > > > Today, HDFS always reads through a socket even when the data is
> > local to
> > > > the client. This adds a lot of overhead, especially for warm reads.
It
> > > > should be possible for a dfs client to test if a block to be read
> > is local
> > > > and if so, bypass socket and read through local FS api directly.
This
> > > > should improve random access performance significantly (e.g., for
> > HBase).
> > > > Has this been considered in HDFS? Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> >

Re: short-circuiting HDFS reads

Reply via email to