Hmm.... I have used Lucene very effectively in item-based recommendation settings before and user-based recommendations were marginally acceptable.
All that the data store has to do is fetch the lists of related items for the most important few items in the persons history. With a multi-fetch operation (which Cassandra may or may not have), this is one server round-trip. It is definitely much faster to keep lots of items in memory, though. The off-line processing to build the item-item relationships, however, would require a scan of all user profiles which may be a bit intense, especially if the NOSQL store is being used at the same time to service user requests. I have found it preferable to grovel some form of log file in HDFS to get this information in the past. On Mon, May 31, 2010 at 9:00 AM, Sean Owen <[email protected]> wrote: > So, any such data store is way too slow to use with a real-time > recommender. > > But a distributed algorithm? sure. As you say, the distributed version > runs on Hadoop, and you can transfer between HDFS and Cassandra. Not > sure whether to call that integration -- there's nothing the project > would meaningfully do here, since it reads off HDFS. >
