George, Thanks for the comment.
Note that HDFS files are never updated in-place. This should make it easy to deal with cache consistency. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com (408)927-1886 (phone) (408)927-3215 (fax) george.por...@sun.com wrote on 01/12/2009 10:11:08 AM: > Hi Jun, > > Thanks for the pointer to clear the disk cache, as well as the > suggestion for creating a DFS Client cache layer. As for the double > buffering overhead, I think that there is not going to be a large > benefit to buffering in the DataNode, since the DataNode itself does not > use the data in the buffer (it just forwards that data to HDFS > clients). With the ability to perform zero-copy I/O operations, it > probably shouldn't buffer any data at all, since it could just > sendfile() the data directly from the disk to the network client via > DMA, rather than copying it from disk into its memory space, then from > memory to the socket. The downside of a DFS client cache is that it > would need to be kept consistent, which would probably add a lot of > complexity to the client. It is an interesting idea, though, and I > think we should keep thinking about it. > > Thanks, > George > > Jun Rao wrote: > > > > Hi, George, > > > > I read the results in your JIRA. Very encouraging. It would be useful > > to test the improvement on both cold and warm data (warm data likely > > has larger improvment). There is a simple way to clear the file cache > > on Linux (http://www.linuxinsight.com/proc_sys_vm_drop_caches.html). > > > > An alternative approach is to build an in-memory caching layer on top > > of a DFS Client. The advantages are (1) less security issues; (2) > > probably even better performance since checksum can be avoided once > > the data is cached in memory; (3) the caching layer can be used > > anywhere, no just nodes owning a block locally. The disadvantage is > > that data is buffered twice in memory: once in the caching layer and > > once in the OS file cache. One can probably limit the OS file cache > > size (not sure if there is an easy way in Linux). What's your thought > > on this? > > > > Jun > > > > george.por...@sun.com wrote on 01/08/2009 10:13:25 AM: > > > > > Hi Jun, > > > > > > The earlier responses to your email reference the JIRA that I opened > > > about this issue. Short-circuiting the primary HDFS datapath does > > > improve throughput, and the amount depends on your workload (random > > > reads especially). Some initial experimental results are posted to > > that > > > JIRA. A second advantage is that since the JVM hosting the HDFS client > > > is doing the reading, the O/S will satisfy future disk requests from > > the > > > cache, which isn't really possible when you read over the network (even > > > to another JVM on the same host). > > > > > > There are several real disadvantages, the largest of which include > > 1) it > > > adds a new datapath, and 2) bypasses various security and auditing > > > features of HDFS. I would certainly like to think through a more clean > > > interface for achieving this goal, especially since reading local data > > > should be the common case. Any thoughts you might have would be > > > appreciated. > > > > > > Thanks, > > > George > > > > > > Jun Rao wrote: > > > > Hi, > > > > > > > > Today, HDFS always reads through a socket even when the data is > > local to > > > > the client. This adds a lot of overhead, especially for warm reads. It > > > > should be possible for a dfs client to test if a block to be read > > is local > > > > and if so, bypass socket and read through local FS api directly. This > > > > should improve random access performance significantly (e.g., for > > HBase). > > > > Has this been considered in HDFS? Thanks, > > > > > > > > Jun > > > > > > > > > >