I've been a longtime lurker and I'm still getting used to the ins and outs of using Mahout (I've made some hacks to source in my own environment and have done some testing, but nothing in production yet) but I'd love to help out on a book, maybe with some of the background material. Maybe I'm the only one who feels this way, but any Mahout book should have some basic introductory background material -- some discussion about machine learning (classification, clustering), high level overviews of algorithms, and maybe some case studies/examples (why use mahout vs. other tools?). And of course, the standard Intro chapter on MapReduce, HDFS, and the rest of the Hadoop environment (including deploying on EC2/S3). Again, it's probably best to sort out what does/doesn't belong, but first I think it would be a useful excercise to figure out who the intended audience really is. In my mind I would break it down into a few possibilities:
1. Java developers looking to incorporate ML algorithms into their existing projects/software. 2. People from more of an academic background well versed in ML, IR, NLP, etc. who are looking for an efficient and scalable software tool to use. 3. Devs from a non-Java environment (obv no one is going to write a beginner's Java guide, but highlighting parts of the API that may be able to interface with other tools -- I have a small library of python wrappers I use to set up and run some routine tasks) On Tue, Sep 22, 2009 at 12:17 PM, Sean Owen <sro...@gmail.com> wrote: > As I mentioned to some of you, there's a proposal to begin work on a > book on Mahout. It sounds early, but the publisher assures me it's > about the right time to begin, if we want a book out at roughly the > time '1.0' rolls out in a year or so. I've heard support for the idea, > and think it's a good thing. > > I'm going to move forward drafting a proposal and draft outline of > such a thing. It seems so far I am the (only?) one interested in > significant work in writing such a thing, which is cool, so I can > drive this -- but I'd be concerned if it were just me speaking for the > project book. Hence: > > - Who else might be interested in being a co-author and putting in > significant work? > - Would anyone care to read the proposal before I send it in? > - Would anyone help me, in the short term, draft an outline of the > content of the classification and clustering sections? > > Sean > -- Zaki Rahaman