Neither mongodb nor tokyotyrant seem to have much support for Hadoop based programs. That with the typical low throughput for spilling columns from systems designed for other purposes probably makes using these systems a bit problematic for many machine learning applications, even with sequential learning algorithms like sequential gradient descent. I don't know enough to say whether data dumps to these systems are handled efficiently.
Other nosql data stores are much more Hadoop friendly. These include Cassandra, Voldemort and Hbase. Voldemort and Cassandra are designed with web service in mind, but Cassandra is column based which can lead to good dump speed and Voldemort has the capability of atomically updating the entire database which is cool for deploying a whole new set of recommendations. Hbase was designed with Hadoop integration in mind and thus works quite well with map-reduce programs such as the parallel algorithms like the NaiveBayes classifier in Mahout. With respect to recommendations, it should be pretty easy to add support for keeping raw recommendation in any data store that you like. Typically, though, a higher performance store is required for recommendations to be fast enough. On Tue, Aug 3, 2010 at 8:08 PM, Saikat Kanjilal <[email protected]> wrote: > ... Additionally I was wondering whether there is talk about building > mahout on top of other nosql databases such as mongodb or tokyotyrant. >
