Re: short-circuiting HDFS reads

George Porter Mon, 12 Jan 2009 10:11:57 -0800

Hi Jun,

Thanks for the pointer to clear the disk cache, as well as thesuggestion for creating a DFS Client cache layer. As for the doublebuffering overhead, I think that there is not going to be a largebenefit to buffering in the DataNode, since the DataNode itself does notuse the data in the buffer (it just forwards that data to HDFSclients). With the ability to perform zero-copy I/O operations, itprobably shouldn't buffer any data at all, since it could justsendfile() the data directly from the disk to the network client viaDMA, rather than copying it from disk into its memory space, then frommemory to the socket. The downside of a DFS client cache is that itwould need to be kept consistent, which would probably add a lot ofcomplexity to the client. It is an interesting idea, though, and Ithink we should keep thinking about it.


Thanks,
George

Jun Rao wrote:

Hi, George,
I read the results in your JIRA. Very encouraging. It would be usefulto test the improvement on both cold and warm data (warm data likelyhas larger improvment). There is a simple way to clear the file cacheon Linux (http://www.linuxinsight.com/proc_sys_vm_drop_caches.html).
An alternative approach is to build an in-memory caching layer on topof a DFS Client. The advantages are (1) less security issues; (2)probably even better performance since checksum can be avoided oncethe data is cached in memory; (3) the caching layer can be usedanywhere, no just nodes owning a block locally. The disadvantage isthat data is buffered twice in memory: once in the caching layer andonce in the OS file cache. One can probably limit the OS file cachesize (not sure if there is an easy way in Linux). What's your thoughton this?
Jun

george.por...@sun.com wrote on 01/08/2009 10:13:25 AM:

> Hi Jun,
>
> The earlier responses to your email reference the JIRA that I opened
> about this issue.  Short-circuiting the primary HDFS datapath does
> improve throughput, and the amount depends on your workload (random
> reads especially). Some initial experimental results are posted tothat
> JIRA.  A second advantage is that since the JVM hosting the HDFS client
> is doing the reading, the O/S will satisfy future disk requests fromthe
> cache, which isn't really possible when you read over the network (even
> to another JVM on the same host).
>
> There are several real disadvantages, the largest of which include1) it
> adds a new datapath, and 2) bypasses various security and auditing
> features of HDFS.  I would certainly like to think through a more clean
> interface for achieving this goal, especially since reading local data
> should be the common case.  Any thoughts you might have would be
> appreciated.
>
> Thanks,
> George
>
> Jun Rao wrote:
> > Hi,
> >
> > Today, HDFS always reads through a socket even when the data islocal to
> > the client. This adds a lot of overhead, especially for warm reads. It
> > should be possible for a dfs client to test if a block to be readis local
> > and if so, bypass socket and read through local FS api directly. This
> > should improve random access performance significantly (e.g., forHBase).
> > Has this been considered in HDFS? Thanks,
> >
> > Jun
> >
> >

Re: short-circuiting HDFS reads

Reply via email to