Speculative Processing in Hadoop

shruti jain Mon, 13 Jul 2009 03:18:59 -0700

Hello Everyone,

I am a newbie and need some help. I saw on Hadoop wiki that there can
be projects to improve Hadoop and map-reduce performance on available
benchmarks(sort etc)..


In a distributed file system environment, caching can be followed. In
such systems, whenever a file access is required, the client has to
check the content in the local cache with reference to the server file
system. By the time server responds to this query of the client, the
client can execute the requested operations on the data available in
the cache. If the server responds that the client has the most
recently modified file then the client can proceed with the processing
otherwise it can rollback to a previous state and start with newer
version of the file. This will save processing power, CPU cycles time.

This can be applied to Hadoop as well. Say we are sorting a file. With
map-reduce sorting can be done this way. A client requests the server
about the modification time of the file and starts execution on the
file it has in the cache. When server responds it can check the cached
copy and proceed accordingly.

Could any one please discuss whether this can be done in Hadoop or
not. Is it already implemented or is anyone else working on the same.
If this is not the right place to discuss then can you direct me to
some other source of information.

Thank You.

Shruti

Speculative Processing in Hadoop

Reply via email to