The difference being that we focus on scalable.  This might involve hadoop
for some, all or none of the steps.

My definition of scalable is "handles data as big as nearly anybody
produces".  That may or may not require Hadoop to do.  Many on-line learning
systems are so fast that a single machine can munch near google scale
amounts of data in a few hours.  Many other algorithms might require Hadoop
for an aggregation step, but nothing else.  Other algorithms might depend on
a cluster of Lucene nodes.

In any case, I think that the focus of Mahout should be scalable learning.
Period.

The methods used should be drawn from a useful toolkit which prominently
includes Hadoop.  And Lucene.  And some linear algebra stuff.  And Taste.

This leaves open whether the focus of the book should be scalable learning
or whether it should be learning with Hadoop.

On Tue, Sep 22, 2009 at 10:18 AM, Sean Owen <sro...@gmail.com> wrote:

> The difference being, not emphasizing Hadoop? I understand that. I
> also recall we'd agreed that we were not realistically considering any
> other distributed processing framework in the near future, which I
> took to mean before v1.0?
>
> On Tue, Sep 22, 2009 at 11:59 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> > I would amend that (again) to clustering, classification and
> recommendations
> > at scale.  With Hadoop where necessary.
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to