I think this is the wrong angle to go about it - like you mentioned in your first post, the Linux file system cache *should* be taking care of this for us. That it is not is a fault of the current implementation and not an inherent problem.
I think one solution is HDFS-347 - I'm putting the finishing touches on a design doc for that JIRA and should have it up in the next day or two. -Todd On Tue, Oct 6, 2009 at 5:25 PM, Edward Capriolo <[email protected]>wrote: > On Tue, Oct 6, 2009 at 6:12 PM, Aaron Kimball <[email protected]> wrote: > > Edward, > > > > Interesting concept. I imagine that implementing "CachedInputFormat" over > > something like memcached would make for the most straightforward > > implementation. You could store 64MB chunks in memcached and try to > retrieve > > them from there, falling back to the filesystem on failure. One obvious > > potential drawback of this is that a memcached cluster might store those > > blocks on different servers than the file chunks themselves, leading to > an > > increased number of network transfers during the mapping phase. I don't > know > > if it's possible to "pin" the objects in memcached to particular nodes; > > you'd want to do this for mapper locality reasons. > > > > I would say, though, that 1 GB out of 8 GB on a datanode is somewhat > > ambitious. It's been my observation that people tend to write > memory-hungry > > mappers. If you've got 8 cores in a node, and 1 GB each have already gone > to > > the OS, the datanode, and the tasktracker, that leaves only 5 GB for task > > processes. Running 6 or 8 map tasks concurrently can easily gobble that > up. > > On a 16 GB datanode with 8 cores, you might get that much wiggle room > > though. > > > > - Aaron > > > > > > On Tue, Oct 6, 2009 at 8:16 AM, Edward Capriolo <[email protected] > >wrote: > > > >> After looking at the HBaseRegionServer and its functionality, I began > >> wondering if there is a more general use case for memory caching of > >> HDFS blocks/files. In many use cases people wish to store data on > >> Hadoop indefinitely, however the last day,last week, last month, data > >> is probably the most actively used. For some Hadoop clusters the > >> amount of raw new data could be less then the RAM memory in the > >> cluster. > >> > >> Also some data will be used repeatedly, the same source data may be > >> used to generate multiple result sets, and those results may be used > >> as the input to other processes. > >> > >> I am thinking an answer could be to dedicate an amount of physical > >> memory on each DataNode, or on several dedicated node to a distributed > >> memcache like layer. Managing this cache should be straight forward > >> since hadoop blocks are pretty much static. (So say for a DataNode > >> with 8 GB of memory dedicate 1GB to HadoopCacheServer.) If you had > >> 1000 Nodes that cache would be quite large. > >> > >> Additionally we could create a new file system type cachedhdfs > >> implemented as a facade, or possibly implement CachedInputFormat or > >> CachedOutputFormat. > >> > >> I know that the underlying filesystems have cache, but I think Hadoop > >> writing intermediate data is going to evict some of the data which > >> "should be" semi-permanent. > >> > >> So has anyone looked into something like this? This was the closest > >> thing I found. > >> > >> http://issues.apache.org/jira/browse/HADOOP-288 > >> > >> My goal here is to keep recent data in memory so that tools like Hive > >> can get a big boost on queries for new data. > >> > >> Does anyone have any ideas? > >> > > > > Aaron, > > Yes 1GB out of 8GB was just an arbitrary value I decided. Remember > that 16K of ram did get a man to the moon. :) I am thinking the value > would be configurable, say dfs.cache.mb. > > Also there is the details of cache eviction, or possibly including and > excluding paths and files. > > Other then the InputFormat concept we could plug the cache in directly > into the DFSclient. In this way the cache would always end up on the > node where the data was. Otherwise the InputFormat will have to manage > that which would be a lot of work. I think if we prove the concept we > can then follow up and get it more optimized. > > I am poking around the Hadoop internals to see what options we have. > My first implementation I will probably patch some code, run some > tests, profile performance. >
