Hadoop doesn't support this natively. So if you need this kind of a
functionality, you'd need to code your application in such a way. But I am
worried about the race conditions in determining which task should first
create the ramfs and load the data.
If you can provide atomicity in determining whether the ramfs has been
created and data loaded, and if not, then do the creation/load, then things
should work. 
If atomicity cannot be guaranteed, you might consider this -
1) Run a job with only maps that creates the ramfs and loads the data (if
your cluster is small you can do this manually). You can use distributed
cache to store the data you want to load.
2) Run your job that processes the data
3) Run a third job to delete the ramfs.


On 9/5/08 1:29 PM, "Amit Kumar Singh" <[EMAIL PROTECTED]> wrote:

> Can we use something like RAM FS to share static data across map tasks.
> 
> Scenario,
> 1) Quadcore machine
> 2) 2 1-TB Disk
> 3) 8 GB ram,
> 
> Now Ii need ~2.7 GB ram per Map process to load some static data in memory
> using which i would be processing data.(cpu intensive jobs)
> 
> Can i share memory across mappers on the same machine so that memory
> footprint is less and i can run more than 4 mappers simultaneously
> utilizing all 4 cores.
> 
> Can we use stuff like RamFS
> 


Reply via email to