I also appreciate your project, Sean. I think it's crucial to have 'battle-tested' code flow back into Mahout and commercial software on top of open source increases the visibility of the open source project.
--sebastian On 05.04.2012 09:13, Dawid Weiss wrote: > +1 to commercial involvement. "Commercial" coming from folks like Sean > or Jake translates to me as "productive" and "large scale" and > "realistic applications", not (just) rebranding and marketing. > > Projects like Mahout need a commercial spin to prove and sustain their > applicability for realistic problems. Otherwise they become Weka-ish > (not that I have anything against Weka -- I like it, but I think it's > not really useful beyond spike prototypes, experimenting and > research). > > Dawid > > On Thu, Apr 5, 2012 at 1:18 AM, Jake Mannix <[email protected]> wrote: >> +1 to everything Ted said. >> >> As an added point, while we're on the subject of corporate involvement, >> forks, and extensions of Mahout, now is as good a time as any to announce >> that I (and my teammate Andy Schlaikjer) are maintaining a official >> "Twitter fork" of Mahout (hosted and worked on entirely in the open on >> GitHub: http://github.com/twitter/mahout ), which we'll be making patches >> off of to submit back to Apache trunk on a periodic basis. >> >> You might well ask: why not just submit JIRA tickets and patches >> directly, esp. because this twitter team has a committer? The reasoning I >> had was one of expedience and safety: there are modifications and >> improvements which I have wanted available in our internal build (which >> pulls from our corp maven repo), but still haven't undergone solid >> testing. >> >> I could apply patches to a particular trunk svn rev, and deploy that >> internally (like lots of places have "hadoop-0.20.3+patch5" and we have >> patched pig, etc), but a) I like being able to just commit to a gitrepo, >> pull in changes, iterate, test, cut a release tag, push immediately into >> maven for consumption by appropriate internal projects; and b) I wanted it >> out in the open to keep myself honest: doing it internally would open the >> possibility of accidentally mixing private and public code, and also, if I >> get lazy and don't contribute the code back to trunk, anyone else is free >> to generate a patch and do it themselves (c.f. slowness of getting >> HBase/HDFS fixes out of Facebook, historically). >> >> Right now, twitter's fork is primarily focused around LDA / topic >> modeling work, but recently I've been also working on a nice little jruby >> REPL wrapper. Currently it only supports loading SequenceFiles of >> dictionaries and Vectors into memory and running LDA inference and >> introspecting on the models themself. Invokable via >> "$MAHOUT-HOME/bin/mahout console" if you have JRUBY-HOME defined. That >> console provides a WAY faster way to inspect models, vectors, etc, and in >> fact would be a great place to launch jobs from, if we take the approach >> mentioned recently of having the run() method of AbstractJob be async, and >> return a handle on the current running state of the job. Then you could >> start up a console in screen, launch your job, and check in on it. >> >> Not to threadjack, but if we're talking about forks, commercial >> development and so forth, I thought now was as good a time as any to talk >> about this! >> >> -jake >> >> On Apr 4, 2012 2:36 PM, "Ted Dunning" <[email protected]> wrote: >> >> With this announcement, this group has a fork in the road facing us. >> >> We can choose the Hadoop path of forcibly excluding anybody with a slightly >> wrong commercial taint from discussions (I call this the "more GNU than >> GNU" philosophy). >> >> Or we can choose a real community based approach that includes vendors >> regardless of how they use the code that we freely give away via the Apache >> Mahout project (I call this "the Apache way"). >> >> As you may guess from the way that I phrase these options, I would prefer >> the second approach. >> >> As such, I like it if we could resolve as a group that we very much welcome >> what Sean is doing as an augmentation rather than diminution of the major >> role that he has played in Mahout so far. More than that, I would like to >> go on record saying that I, at least, am happy to have all kinds of >> participation in Mahout. >> >> Is this the consensus here? I think it is important to bring this subject >> up early and get a definitive consensus rather than let it drift. >> >> On Wed, Apr 4, 2012 at 12:33 PM, Sean Owen <[email protected]> wrote: > Dear >> all -- I've long pro...
