The challenges of this design is people accessing the same data over and over again is the uncommon usecase for hadoop. Hadoop's bread and butter is all about streaming through large datasets that do not fit in memory. Also your shuffle-sort-spill is going to play havoc on and file system based cache. The distributed cache roughly fits this role except that it does not persist after a job.
Replicating content to N nodes also is not a hard problem to tackle (you can hack up a content delivery system with ssh+rsync) and get similar results.The approach often taken has been to keep data that is accessed repeatedly and fits in memory in some other system (hbase/cassandra/mysql/whatever). Edward On Mon, Jan 16, 2012 at 11:33 AM, Rita <[email protected]> wrote: > Thanks. I believe this is a good feature to have for clients especially if > you are reading the same large file over and over. > > > On Sun, Jan 15, 2012 at 7:33 PM, Todd Lipcon <[email protected]> wrote: > > > There is some work being done in this area by some folks over at UC > > Berkeley's AMP Lab in coordination with Facebook. I don't believe it > > has been published quite yet, but the title of the project is "PACMan" > > -- I expect it will be published soon. > > > > -Todd > > > > On Sat, Jan 14, 2012 at 5:30 PM, Rita <[email protected]> wrote: > > > After reading this article, > > > http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I > was > > > wondering if there was a filesystem cache for hdfs. For example, if a > > large > > > file (10gigabytes) was keep getting accessed on the cluster instead of > > keep > > > getting it from the network why not storage the content of the file > > locally > > > on the client itself. A use case on the client would be like this: > > > > > > > > > > > > <property> > > > <name>dfs.client.cachedirectory</name> > > > <value>/var/cache/hdfs</value> > > > </property> > > > > > > > > > <property> > > > <name>dfs.client.cachesize</name> > > > <description>in megabytes</description> > > > <value>100000</value> > > > </property> > > > > > > > > > Any thoughts of a feature like this? > > > > > > > > > -- > > > --- Get your facts first, then you can distort them as you please.-- > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > > > > -- > --- Get your facts first, then you can distort them as you please.-- >
