I would propose either store the files in hbase, which will keep an active copy available, or replicate the files manually to all of your machines, and have a task that mmaps the file in to shared memory. The mmap can lock the pages in and fault them in to ensure they are resident.

Then have your jobs attach the shared memory, or simply read the files normally.

Shimi K wrote:
Is Hadoop cache frequently/LRU/MRU map input files? Or does it upload files
from the disk each time a file is needed no matter if it was the same file
that was required by the last job on the same node?

I am currently using version 0.14.4

- Shimi

--
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Reply via email to