Re: Hello Folks

Ted Dunning Tue, 03 Aug 2010 20:21:38 -0700

Neither mongodb nor tokyotyrant seem to have much support for Hadoop based
programs.  That with the typical low throughput for spilling columns from
systems designed for other purposes probably makes using these systems a bit
problematic for many machine learning applications, even with sequential
learning algorithms like sequential gradient descent.  I don't know enough
to say whether data dumps to these systems are handled efficiently.

Other nosql data stores are much more Hadoop friendly.  These include
Cassandra, Voldemort and Hbase.  Voldemort and Cassandra are designed with
web service in mind, but Cassandra is column based which can lead to good
dump speed and Voldemort has the capability of atomically updating the
entire database which is cool for deploying a whole new set of
recommendations.  Hbase was designed with Hadoop integration in mind and
thus works quite well with map-reduce programs such as the parallel
algorithms like the NaiveBayes classifier in Mahout.

With respect to recommendations, it should be pretty easy to add support for
keeping raw recommendation in any data store that you like.  Typically,
though, a higher performance store is required for recommendations to be
fast enough.

On Tue, Aug 3, 2010 at 8:08 PM, Saikat Kanjilal <[email protected]> wrote:

> ...  Additionally I was wondering whether there is talk about building
> mahout on top of other nosql databases such as mongodb or tokyotyrant.
>

Re: Hello Folks

Reply via email to