RE: Caching frequently map input files

Joydeep Sen Sarma Mon, 11 Feb 2008 00:12:07 -0800

u could make a ramfs file system (totally in-memory) on each node and configure 
hdfs to use that.

-----Original Message-----
From: Shimi K [mailto:[EMAIL PROTECTED]
Sent: Sun 2/10/2008 10:05 PM
To: [email protected]
Subject: Re: Caching frequently map input files

I choose Hadoop more for the distributed calculation then the support for
huge files and my files do fit into memory.
I have a lot of small files and my system needs to search for something in
those files very fast. I figured I can distribute the files on a Hadoop
cluster and then uses the distributed calculation to do the search in
parallel on many files as possible. This way I would be able to return a
result faster then if I would have used one machine.

Is there a way to tell which files are in memory?

On Feb 10, 2008 10:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
> But if your files DO fit into memory then the datanodes that have copies
> of
> the blocks of your file will probably still have them in memory and since
> maps are typically data local, you will benefit as much as possible.
>
>
> On 2/10/08 11:17 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:
>
> >> Is Hadoop cache frequently/LRU/MRU map input files? Or does it
> >> upload files
> >> from the disk each time a file is needed no matter if it was the
> >> same file
> >> that was required by the last job on the same node?
> >>
> >
> > There is no concept of caching input files across jobs.
> >
> > Hadoop is geared towards dealing with _huge_ amounts of data which
> > don't fit into memory anyway... and hence doing it across jobs is moot.
>
>

RE: Caching frequently map input files

Reply via email to