Hi, I am trying to do 'on demand map reduce' - something which will return in reasonable time (a few seconds).
My dataset is relatively small and can fit into my datanode's memory. Is it possible to keep a block in the datanode's memory so on the next job the response will be much quicker? The majority of the time spent during the job run appears to be during the 'HDFS_BYTES_READ' part of the job. I have tried using the setNumTasksToExecutePerJvm but the block still seems to be cleared from memory after the job. thanks!