There is certainly no reason to make 'using Hadoop, and nothing else'
a long-term goal. I think there are many reasons to focus on Hadoop in
the short term. And I think this book is about the short term, Mahout
v1.0.

That is I don't disagree -- there's every reason to state the
long-term goal of Mahout correctly, while saying that the book will be
talking about Mahout + Hadoop, because that's what Mahout v1.0 does,
and the book is a 1st edition about v1.0.

I suppose I should emphasize that I think the book ought to be a
cookbook (as Tanton just suggested) rather than a more theoretical
book about how these techniques must be approached differently at
scale. At least, I can't write that theoretical book, and at the
moment, if there is a book to be written, and it seems I've got the
most time to put into it, it would be more about how to use what
Mahout v1.0 is in practice.

But that opens the question -- should that be written? would it be
better to consider a different style of book, later?

On Tue, Sep 22, 2009 at 12:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> The difference being that we focus on scalable.  This might involve hadoop
> for some, all or none of the steps.
>
> My definition of scalable is "handles data as big as nearly anybody
> produces".  That may or may not require Hadoop to do.  Many on-line learning
> systems are so fast that a single machine can munch near google scale
> amounts of data in a few hours.  Many other algorithms might require Hadoop
> for an aggregation step, but nothing else.  Other algorithms might depend on
> a cluster of Lucene nodes.
>
> In any case, I think that the focus of Mahout should be scalable learning.
> Period.
>
> The methods used should be drawn from a useful toolkit which prominently
> includes Hadoop.  And Lucene.  And some linear algebra stuff.  And Taste.
>
> This leaves open whether the focus of the book should be scalable learning
> or whether it should be learning with Hadoop.

Reply via email to