Re: FileSystem Caching in Hadoop

Todd Lipcon Wed, 07 Oct 2009 07:49:05 -0700

On Wed, Oct 7, 2009 at 7:45 AM, Edward Capriolo <[email protected]>wrote:


>
> Todd,
>
> I do think it could be an inherent problem. With all the reading and
> writing of intermediate data hadoop does, the file system cache would
> would likely never contain the initial raw data you want to work with.
> The HBase RegionServer seems to be successful, so there must be some
> place for caching.
>
> Once I get something in HDFS, like lasts hours log data, about 40
> different processes are going to repeatedly re/read it from disk. I
> think if i can force that data into a cache I can get much faster
> processing.
>
> In cases like this, we should expose access type hints like posix_fadvise
POSIX_ADV_DONTNEED for the data we dont' want to end up in the cache.
There's already a JIRA out there for a JNI library for platform specific
optimization, and I think this is one that will be worth doing.

-ToddEdward

Re: FileSystem Caching in Hadoop

Reply via email to