This is more a question of how you store your input to Hadoop... it's
not directly tied to Mahout I think.

NoSQL data stores are good at fast random-access. The Hadoop model for
input is much more about sequential reads. So you can read from
Cassandra for sure; Cassandra's nice properties aren't really being
used in that case.

Ehcache would only be helping, if anything, speed up random access,
which would not really help.

I can think of several uses for Ehcache but this might not quite be
it. For example -- many M/Rs 'cheat' by trying to cache and read side
information for performance. You can bet it would be useful there.

On Sat, Sep 10, 2011 at 8:43 PM, Dhruv Kumar <[email protected]> wrote:
> Well, my understanding was that Ehcache allows name-value pairs to be stored
> in-memory, reducing disk transactions. So, if I put Ehcache on top of a
> NoSQL persistence store such as Cassandra which is also a key-value store,
> it should speed up the performance of a MapReduce app.
>
> On Sat, Sep 10, 2011 at 3:32 PM, Sean Owen <[email protected]> wrote:
>
>> What are you thinking it might cache?
>>
>> On Sat, Sep 10, 2011 at 8:06 PM, Dhruv Kumar <[email protected]> wrote:
>> > Has anyone over here used EHcache with Mahout (or pure Hadoop jobs)?
>> >
>> > http://ehcache.org/
>> >
>> > For iterative MapReduce applications running on a NoSQL data store, it
>> > should provide a good performance boost by providing an in-memory object
>> > cache (I think). Any comments?
>> >
>>
>

Reply via email to