Hi Jun,

Thanks for the pointer to clear the disk cache, as well as the suggestion for creating a DFS Client cache layer. As for the double buffering overhead, I think that there is not going to be a large benefit to buffering in the DataNode, since the DataNode itself does not use the data in the buffer (it just forwards that data to HDFS clients). With the ability to perform zero-copy I/O operations, it probably shouldn't buffer any data at all, since it could just sendfile() the data directly from the disk to the network client via DMA, rather than copying it from disk into its memory space, then from memory to the socket. The downside of a DFS client cache is that it would need to be kept consistent, which would probably add a lot of complexity to the client. It is an interesting idea, though, and I think we should keep thinking about it.

Thanks,
George

Jun Rao wrote:

Hi, George,

I read the results in your JIRA. Very encouraging. It would be useful to test the improvement on both cold and warm data (warm data likely has larger improvment). There is a simple way to clear the file cache on Linux (http://www.linuxinsight.com/proc_sys_vm_drop_caches.html).

An alternative approach is to build an in-memory caching layer on top of a DFS Client. The advantages are (1) less security issues; (2) probably even better performance since checksum can be avoided once the data is cached in memory; (3) the caching layer can be used anywhere, no just nodes owning a block locally. The disadvantage is that data is buffered twice in memory: once in the caching layer and once in the OS file cache. One can probably limit the OS file cache size (not sure if there is an easy way in Linux). What's your thought on this?

Jun

george.por...@sun.com wrote on 01/08/2009 10:13:25 AM:

> Hi Jun,
>
> The earlier responses to your email reference the JIRA that I opened
> about this issue.  Short-circuiting the primary HDFS datapath does
> improve throughput, and the amount depends on your workload (random
> reads especially). Some initial experimental results are posted to that
> JIRA.  A second advantage is that since the JVM hosting the HDFS client
> is doing the reading, the O/S will satisfy future disk requests from the
> cache, which isn't really possible when you read over the network (even
> to another JVM on the same host).
>
> There are several real disadvantages, the largest of which include 1) it
> adds a new datapath, and 2) bypasses various security and auditing
> features of HDFS.  I would certainly like to think through a more clean
> interface for achieving this goal, especially since reading local data
> should be the common case.  Any thoughts you might have would be
> appreciated.
>
> Thanks,
> George
>
> Jun Rao wrote:
> > Hi,
> >
> > Today, HDFS always reads through a socket even when the data is local to
> > the client. This adds a lot of overhead, especially for warm reads. It
> > should be possible for a dfs client to test if a block to be read is local
> > and if so, bypass socket and read through local FS api directly. This
> > should improve random access performance significantly (e.g., for HBase).
> > Has this been considered in HDFS? Thanks,
> >
> > Jun
> >
> >

Reply via email to