Well a classical solution to that on Linux would be to mmap a cache file into multiple processes. No idea if you can do something like that with Java.
Andreas On Friday 05 September 2008 10:28:37 Devaraj Das wrote: > Hadoop doesn't support this natively. So if you need this kind of a > functionality, you'd need to code your application in such a way. But I am > worried about the race conditions in determining which task should first > create the ramfs and load the data. > If you can provide atomicity in determining whether the ramfs has been > created and data loaded, and if not, then do the creation/load, then things > should work. > If atomicity cannot be guaranteed, you might consider this - > 1) Run a job with only maps that creates the ramfs and loads the data (if > your cluster is small you can do this manually). You can use distributed > cache to store the data you want to load. > 2) Run your job that processes the data > 3) Run a third job to delete the ramfs. > > On 9/5/08 1:29 PM, "Amit Kumar Singh" <[EMAIL PROTECTED]> wrote: > > Can we use something like RAM FS to share static data across map tasks. > > > > Scenario, > > 1) Quadcore machine > > 2) 2 1-TB Disk > > 3) 8 GB ram, > > > > Now Ii need ~2.7 GB ram per Map process to load some static data in > > memory using which i would be processing data.(cpu intensive jobs) > > > > Can i share memory across mappers on the same machine so that memory > > footprint is less and i can run more than 4 mappers simultaneously > > utilizing all 4 cores. > > > > Can we use stuff like RamFS
signature.asc
Description: This is a digitally signed message part.
