There is certainly no reason to make 'using Hadoop, and nothing else' a long-term goal. I think there are many reasons to focus on Hadoop in the short term. And I think this book is about the short term, Mahout v1.0.
That is I don't disagree -- there's every reason to state the long-term goal of Mahout correctly, while saying that the book will be talking about Mahout + Hadoop, because that's what Mahout v1.0 does, and the book is a 1st edition about v1.0. I suppose I should emphasize that I think the book ought to be a cookbook (as Tanton just suggested) rather than a more theoretical book about how these techniques must be approached differently at scale. At least, I can't write that theoretical book, and at the moment, if there is a book to be written, and it seems I've got the most time to put into it, it would be more about how to use what Mahout v1.0 is in practice. But that opens the question -- should that be written? would it be better to consider a different style of book, later? On Tue, Sep 22, 2009 at 12:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > The difference being that we focus on scalable. This might involve hadoop > for some, all or none of the steps. > > My definition of scalable is "handles data as big as nearly anybody > produces". That may or may not require Hadoop to do. Many on-line learning > systems are so fast that a single machine can munch near google scale > amounts of data in a few hours. Many other algorithms might require Hadoop > for an aggregation step, but nothing else. Other algorithms might depend on > a cluster of Lucene nodes. > > In any case, I think that the focus of Mahout should be scalable learning. > Period. > > The methods used should be drawn from a useful toolkit which prominently > includes Hadoop. And Lucene. And some linear algebra stuff. And Taste. > > This leaves open whether the focus of the book should be scalable learning > or whether it should be learning with Hadoop.