I'm afraid I might have a bit unclear (the different layers of Mahout are still very new to me, my apologizes). What I had in mind was basically to provide a replacement for, say, org.apache.mahout.cf.taste.impl.recommender.slopeone.jdbc.* and org.apache.mahoutt.cf.taste.impl.similarity.jdbc.*
The data structure used by these classes is very simple, hence I thought it might make sense to store them in a process with less other overhead than a full blown RDBMS. The advantage of using a Key-pair distributed system for these seemed obvious to me: several nodes providing resilience, and allowing for scalibility on the querying side of things.... 2010/5/31 Ted Dunning <[email protected]> > Hmm.... > > I have used Lucene very effectively in item-based recommendation settings > before and user-based recommendations were marginally acceptable. > > All that the data store has to do is fetch the lists of related items for > the most important few items in the persons history. With a multi-fetch > operation (which Cassandra may or may not have), this is one server > round-trip. It is definitely much faster to keep lots of items in memory, > though. > > The off-line processing to build the item-item relationships, however, > would > require a scan of all user profiles which may be a bit intense, especially > if the NOSQL store is being used at the same time to service user requests. > I have found it preferable to grovel some form of log file in HDFS to get > this information in the past. > > On Mon, May 31, 2010 at 9:00 AM, Sean Owen <[email protected]> wrote: > > > So, any such data store is way too slow to use with a real-time > > recommender. > > > > But a distributed algorithm? sure. As you say, the distributed version > > runs on Hadoop, and you can transfer between HDFS and Cassandra. Not > > sure whether to call that integration -- there's nothing the project > > would meaningfully do here, since it reads off HDFS. > > >
