For most hadoop use-cases, the size of the working set of data for a job far exceeds the disk/memory capacity of a single machine. Because of this reason, caching data does not help most Hadoop workloads. Hadoop clients also have built-in read-aheads for sequential data access.
If you have a workload that can leverage the benefits of caching data, then you can always implement it as a layer on top of Hadoop. You can write something like a CacheFileSystem (similar in lines to ChecksumFileSystem) that can be layered above a FileSystem client. thanks, dhruba On Mon, Jul 13, 2009 at 3:18 AM, shruti jain <shruti.jain1...@gmail.com>wrote: > Hello Everyone, > > I am a newbie and need some help. I saw on Hadoop wiki that there can > be projects to improve Hadoop and map-reduce performance on available > benchmarks(sort etc).. > > In a distributed file system environment, caching can be followed. In > such systems, whenever a file access is required, the client has to > check the content in the local cache with reference to the server file > system. By the time server responds to this query of the client, the > client can execute the requested operations on the data available in > the cache. If the server responds that the client has the most > recently modified file then the client can proceed with the processing > otherwise it can rollback to a previous state and start with newer > version of the file. This will save processing power, CPU cycles time. > > This can be applied to Hadoop as well. Say we are sorting a file. With > map-reduce sorting can be done this way. A client requests the server > about the modification time of the file and starts execution on the > file it has in the cache. When server responds it can check the cached > copy and proceed accordingly. > > Could any one please discuss whether this can be done in Hadoop or > not. Is it already implemented or is anyone else working on the same. > If this is not the right place to discuss then can you direct me to > some other source of information. > > Thank You. > > Shruti >