Caching in-process like this is likely to have much more satisfactory
results than an external caching process.  Also, caching structures with
repetitive access patterns is obviously better than caching single access
data.  Thus caching small side data works well.  Map inputs do not.

On Sat, Sep 10, 2011 at 6:28 PM, Robin Anil <[email protected]> wrote:

> I once wrote a simple cache for HBaseDatastore in naive Bayes classifier
> package and yes the speedup was really awesome, weights of high freq words
> got cached and incremental lookup for rest of the words in a document was
> really low. I had posted numbers on the old JIRA ticket
>  On Sep 11, 2011 12:36 AM, "Dhruv Kumar" <[email protected]> wrote:
> > Has anyone over here used EHcache with Mahout (or pure Hadoop jobs)?
> >
> > http://ehcache.org/
> >
> > For iterative MapReduce applications running on a NoSQL data store, it
> > should provide a good performance boost by providing an in-memory object
> > cache (I think). Any comments?
>

Reply via email to