Hi Jun,
Thanks for the pointer to clear the disk cache, as well as the
suggestion for creating a DFS Client cache layer. As for the double
buffering overhead, I think that there is not going to be a large
benefit to buffering in the DataNode, since the DataNode itself does not
use the data in the buffer (it just forwards that data to HDFS
clients). With the ability to perform zero-copy I/O operations, it
probably shouldn't buffer any data at all, since it could just
sendfile() the data directly from the disk to the network client via
DMA, rather than copying it from disk into its memory space, then from
memory to the socket. The downside of a DFS client cache is that it
would need to be kept consistent, which would probably add a lot of
complexity to the client. It is an interesting idea, though, and I
think we should keep thinking about it.
Thanks,
George
Jun Rao wrote:
Hi, George,
I read the results in your JIRA. Very encouraging. It would be useful
to test the improvement on both cold and warm data (warm data likely
has larger improvment). There is a simple way to clear the file cache
on Linux (http://www.linuxinsight.com/proc_sys_vm_drop_caches.html).
An alternative approach is to build an in-memory caching layer on top
of a DFS Client. The advantages are (1) less security issues; (2)
probably even better performance since checksum can be avoided once
the data is cached in memory; (3) the caching layer can be used
anywhere, no just nodes owning a block locally. The disadvantage is
that data is buffered twice in memory: once in the caching layer and
once in the OS file cache. One can probably limit the OS file cache
size (not sure if there is an easy way in Linux). What's your thought
on this?
Jun
george.por...@sun.com wrote on 01/08/2009 10:13:25 AM:
> Hi Jun,
>
> The earlier responses to your email reference the JIRA that I opened
> about this issue. Short-circuiting the primary HDFS datapath does
> improve throughput, and the amount depends on your workload (random
> reads especially). Some initial experimental results are posted to
that
> JIRA. A second advantage is that since the JVM hosting the HDFS client
> is doing the reading, the O/S will satisfy future disk requests from
the
> cache, which isn't really possible when you read over the network (even
> to another JVM on the same host).
>
> There are several real disadvantages, the largest of which include
1) it
> adds a new datapath, and 2) bypasses various security and auditing
> features of HDFS. I would certainly like to think through a more clean
> interface for achieving this goal, especially since reading local data
> should be the common case. Any thoughts you might have would be
> appreciated.
>
> Thanks,
> George
>
> Jun Rao wrote:
> > Hi,
> >
> > Today, HDFS always reads through a socket even when the data is
local to
> > the client. This adds a lot of overhead, especially for warm reads. It
> > should be possible for a dfs client to test if a block to be read
is local
> > and if so, bypass socket and read through local FS api directly. This
> > should improve random access performance significantly (e.g., for
HBase).
> > Has this been considered in HDFS? Thanks,
> >
> > Jun
> >
> >