Hi,

I am trying to do 'on demand map reduce' - something which will return in
reasonable time (a few seconds).

My dataset is relatively small and can fit into my datanode's memory. Is it
possible to keep a block in the datanode's memory so on the next job the
response will be much quicker? The majority of the time spent during the job
run appears to be during the 'HDFS_BYTES_READ' part of the job. I have tried
using the setNumTasksToExecutePerJvm but the block still seems to be cleared
from memory after the job.

thanks!

Reply via email to