+1 to commercial involvement. "Commercial" coming from folks like Sean or Jake translates to me as "productive" and "large scale" and "realistic applications", not (just) rebranding and marketing.
Projects like Mahout need a commercial spin to prove and sustain their applicability for realistic problems. Otherwise they become Weka-ish (not that I have anything against Weka -- I like it, but I think it's not really useful beyond spike prototypes, experimenting and research). Dawid On Thu, Apr 5, 2012 at 1:18 AM, Jake Mannix <[email protected]> wrote: > +1 to everything Ted said. > > As an added point, while we're on the subject of corporate involvement, > forks, and extensions of Mahout, now is as good a time as any to announce > that I (and my teammate Andy Schlaikjer) are maintaining a official > "Twitter fork" of Mahout (hosted and worked on entirely in the open on > GitHub: http://github.com/twitter/mahout ), which we'll be making patches > off of to submit back to Apache trunk on a periodic basis. > > You might well ask: why not just submit JIRA tickets and patches > directly, esp. because this twitter team has a committer? The reasoning I > had was one of expedience and safety: there are modifications and > improvements which I have wanted available in our internal build (which > pulls from our corp maven repo), but still haven't undergone solid > testing. > > I could apply patches to a particular trunk svn rev, and deploy that > internally (like lots of places have "hadoop-0.20.3+patch5" and we have > patched pig, etc), but a) I like being able to just commit to a gitrepo, > pull in changes, iterate, test, cut a release tag, push immediately into > maven for consumption by appropriate internal projects; and b) I wanted it > out in the open to keep myself honest: doing it internally would open the > possibility of accidentally mixing private and public code, and also, if I > get lazy and don't contribute the code back to trunk, anyone else is free > to generate a patch and do it themselves (c.f. slowness of getting > HBase/HDFS fixes out of Facebook, historically). > > Right now, twitter's fork is primarily focused around LDA / topic > modeling work, but recently I've been also working on a nice little jruby > REPL wrapper. Currently it only supports loading SequenceFiles of > dictionaries and Vectors into memory and running LDA inference and > introspecting on the models themself. Invokable via > "$MAHOUT-HOME/bin/mahout console" if you have JRUBY-HOME defined. That > console provides a WAY faster way to inspect models, vectors, etc, and in > fact would be a great place to launch jobs from, if we take the approach > mentioned recently of having the run() method of AbstractJob be async, and > return a handle on the current running state of the job. Then you could > start up a console in screen, launch your job, and check in on it. > > Not to threadjack, but if we're talking about forks, commercial > development and so forth, I thought now was as good a time as any to talk > about this! > > -jake > > On Apr 4, 2012 2:36 PM, "Ted Dunning" <[email protected]> wrote: > > With this announcement, this group has a fork in the road facing us. > > We can choose the Hadoop path of forcibly excluding anybody with a slightly > wrong commercial taint from discussions (I call this the "more GNU than > GNU" philosophy). > > Or we can choose a real community based approach that includes vendors > regardless of how they use the code that we freely give away via the Apache > Mahout project (I call this "the Apache way"). > > As you may guess from the way that I phrase these options, I would prefer > the second approach. > > As such, I like it if we could resolve as a group that we very much welcome > what Sean is doing as an augmentation rather than diminution of the major > role that he has played in Mahout so far. More than that, I would like to > go on record saying that I, at least, am happy to have all kinds of > participation in Mahout. > > Is this the consensus here? I think it is important to bring this subject > up early and get a definitive consensus rather than let it drift. > > On Wed, Apr 4, 2012 at 12:33 PM, Sean Owen <[email protected]> wrote: > Dear > all -- I've long pro...
