I'm afraid I might have a bit unclear (the different layers of Mahout are
still very new to me, my apologizes). What I had in mind was basically to
provide a replacement for, say,
org.apache.mahout.cf.taste.impl.recommender.slopeone.jdbc.*
and
org.apache.mahoutt.cf.taste.impl.similarity.jdbc.*

The data structure used by these classes is very simple, hence I thought it
might make sense to store them in a process with less other overhead than a
full blown RDBMS.
The advantage of using a Key-pair distributed system for these seemed
obvious to me: several nodes providing resilience, and allowing for
scalibility on the querying side of things....


2010/5/31 Ted Dunning <[email protected]>

> Hmm....
>
> I have used Lucene very effectively in item-based recommendation settings
> before and user-based recommendations were marginally acceptable.
>
> All that the data store has to do is fetch the lists of related items for
> the most important few items in the persons history.  With a multi-fetch
> operation (which Cassandra may or may not have), this is one server
> round-trip.  It is definitely much faster to keep lots of items in memory,
> though.
>
> The off-line processing to build the item-item relationships, however,
> would
> require a scan of all user profiles which may be a bit intense, especially
> if the NOSQL store is being used at the same time to service user requests.
>  I have found it preferable to grovel some form of log file in HDFS to get
> this information in the past.
>
> On Mon, May 31, 2010 at 9:00 AM, Sean Owen <[email protected]> wrote:
>
> > So, any such data store is way too slow to use with a real-time
> > recommender.
> >
> > But a distributed algorithm? sure. As you say, the distributed version
> > runs on Hadoop, and you can transfer between HDFS and Cassandra. Not
> > sure whether to call that integration -- there's nothing the project
> > would meaningfully do here, since it reads off HDFS.
> >
>

Reply via email to