This is more a question of how you store your input to Hadoop... it's not directly tied to Mahout I think.
NoSQL data stores are good at fast random-access. The Hadoop model for input is much more about sequential reads. So you can read from Cassandra for sure; Cassandra's nice properties aren't really being used in that case. Ehcache would only be helping, if anything, speed up random access, which would not really help. I can think of several uses for Ehcache but this might not quite be it. For example -- many M/Rs 'cheat' by trying to cache and read side information for performance. You can bet it would be useful there. On Sat, Sep 10, 2011 at 8:43 PM, Dhruv Kumar <[email protected]> wrote: > Well, my understanding was that Ehcache allows name-value pairs to be stored > in-memory, reducing disk transactions. So, if I put Ehcache on top of a > NoSQL persistence store such as Cassandra which is also a key-value store, > it should speed up the performance of a MapReduce app. > > On Sat, Sep 10, 2011 at 3:32 PM, Sean Owen <[email protected]> wrote: > >> What are you thinking it might cache? >> >> On Sat, Sep 10, 2011 at 8:06 PM, Dhruv Kumar <[email protected]> wrote: >> > Has anyone over here used EHcache with Mahout (or pure Hadoop jobs)? >> > >> > http://ehcache.org/ >> > >> > For iterative MapReduce applications running on a NoSQL data store, it >> > should provide a good performance boost by providing an in-memory object >> > cache (I think). Any comments? >> > >> >
