u could make a ramfs file system (totally in-memory) on each node and configure hdfs to use that.
-----Original Message----- From: Shimi K [mailto:[EMAIL PROTECTED] Sent: Sun 2/10/2008 10:05 PM To: [email protected] Subject: Re: Caching frequently map input files I choose Hadoop more for the distributed calculation then the support for huge files and my files do fit into memory. I have a lot of small files and my system needs to search for something in those files very fast. I figured I can distribute the files on a Hadoop cluster and then uses the distributed calculation to do the search in parallel on many files as possible. This way I would be able to return a result faster then if I would have used one machine. Is there a way to tell which files are in memory? On Feb 10, 2008 10:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > But if your files DO fit into memory then the datanodes that have copies > of > the blocks of your file will probably still have them in memory and since > maps are typically data local, you will benefit as much as possible. > > > On 2/10/08 11:17 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > > >> Is Hadoop cache frequently/LRU/MRU map input files? Or does it > >> upload files > >> from the disk each time a file is needed no matter if it was the > >> same file > >> that was required by the last job on the same node? > >> > > > > There is no concept of caching input files across jobs. > > > > Hadoop is geared towards dealing with _huge_ amounts of data which > > don't fit into memory anyway... and hence doing it across jobs is moot. > >
