The Joydeep's ramfs solution is even simpler :)
Jason Venner wrote:
I would propose either store the files in hbase, which will keep an
active copy available, or replicate the files manually to all of your
machines, and have a task that mmaps the file in to shared memory. The
mmap can lock the pages in and fault them in to ensure they are resident.
Then have your jobs attach the shared memory, or simply read the files
normally.
Shimi K wrote:
Is Hadoop cache frequently/LRU/MRU map input files? Or does it upload
files
from the disk each time a file is needed no matter if it was the same
file
that was required by the last job on the same node?
I am currently using version 0.14.4
- Shimi
--
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested