For most hadoop use-cases, the size of the working set of data for a job far
exceeds the disk/memory capacity of a single machine. Because of this
reason, caching data does not help most Hadoop workloads. Hadoop clients
also have built-in read-aheads for sequential data access.

If you have a workload that can leverage the benefits of caching data, then
you can always implement it as a layer on top of Hadoop. You can write
something like a CacheFileSystem (similar in lines to ChecksumFileSystem)
that can be layered above a FileSystem client.

thanks,
dhruba


On Mon, Jul 13, 2009 at 3:18 AM, shruti jain <shruti.jain1...@gmail.com>wrote:

> Hello Everyone,
>
> I am a newbie and need some help. I saw on Hadoop wiki that there can
> be projects to improve Hadoop and map-reduce performance on available
> benchmarks(sort etc)..
>
> In a distributed file system environment, caching can be followed. In
> such systems, whenever a file access is required, the client has to
> check the content in the local cache with reference to the server file
> system. By the time server responds to this query of the client, the
> client can execute the requested operations on the data available in
> the cache. If the server responds that the client has the most
> recently modified file then the client can proceed with the processing
> otherwise it can rollback to a previous state and start with newer
> version of the file. This will save processing power, CPU cycles time.
>
> This can be applied to Hadoop as well. Say we are sorting a file. With
> map-reduce sorting can be done this way. A client requests the server
> about the modification time of the file and starts execution on the
> file it has in the cache. When server responds it can check the cached
> copy and proceed accordingly.
>
> Could any one please discuss whether this can be done in Hadoop or
> not. Is it already implemented or is anyone else working on the same.
> If this is not the right place to discuss then can you direct me to
> some other source of information.
>
> Thank You.
>
> Shruti
>

Reply via email to