Hmm....

I have used Lucene very effectively in item-based recommendation settings
before and user-based recommendations were marginally acceptable.

All that the data store has to do is fetch the lists of related items for
the most important few items in the persons history.  With a multi-fetch
operation (which Cassandra may or may not have), this is one server
round-trip.  It is definitely much faster to keep lots of items in memory,
though.

The off-line processing to build the item-item relationships, however, would
require a scan of all user profiles which may be a bit intense, especially
if the NOSQL store is being used at the same time to service user requests.
 I have found it preferable to grovel some form of log file in HDFS to get
this information in the past.

On Mon, May 31, 2010 at 9:00 AM, Sean Owen <[email protected]> wrote:

> So, any such data store is way too slow to use with a real-time
> recommender.
>
> But a distributed algorithm? sure. As you say, the distributed version
> runs on Hadoop, and you can transfer between HDFS and Cassandra. Not
> sure whether to call that integration -- there's nothing the project
> would meaningfully do here, since it reads off HDFS.
>

Reply via email to