The site looks great. +1 to the commercial involvement (coming from the guy
who cannot do that due to the nature of his job). This really shows how far
this project has come from being just a spinoff of lucene.

More than just battle hardening the codebase, In mahout there is a
"quality" aspect. Since mahout code is open, anyone can come and pick up
the code and measure the quality on their data or on open datasets. With
commercial interests coming into play, it is in the interest of each
company to pitch their quality as best by hiding the secret sauce without
giving a fair chance for the open source version to match-up. And its also
never the right of the community to ask for secret sauce, because each
company worked hard for them.

I am not proposing a solution or talking anything negatively here. I very
much welcome this. However I am curious how other projects balance these
aspects and how we reduce(if not eliminate) the disincentive to publish
back some of the secret sauce (Again coming from a guy who has his lips
sealed :)).



------
Robin Anil


On Thu, Apr 5, 2012 at 1:37 PM, Sebastian Schelter <[email protected]> wrote:

> I also appreciate your project, Sean. I think it's crucial to have
> 'battle-tested' code flow back into Mahout and commercial software on
> top of open source increases the visibility of the open source project.
>
> --sebastian
>
> On 05.04.2012 09:13, Dawid Weiss wrote:
> > +1 to commercial involvement. "Commercial" coming from folks like Sean
> > or Jake translates to me as "productive" and "large scale" and
> > "realistic applications", not  (just) rebranding and marketing.
> >
> > Projects like Mahout need a commercial spin to prove and sustain their
> > applicability for realistic problems. Otherwise they become Weka-ish
> > (not that I have anything against Weka -- I like it, but I think it's
> > not really useful beyond spike prototypes, experimenting and
> > research).
> >
> > Dawid
> >
> > On Thu, Apr 5, 2012 at 1:18 AM, Jake Mannix <[email protected]>
> wrote:
> >> +1 to everything Ted said.
> >>
> >>  As an added point, while we're on the subject of corporate involvement,
> >> forks, and extensions of Mahout, now is as good a time as any to
> announce
> >> that I (and my teammate Andy Schlaikjer) are maintaining a official
> >> "Twitter fork" of Mahout (hosted and worked on entirely in the open on
> >> GitHub: http://github.com/twitter/mahout ), which we'll be making
> patches
> >> off of to submit back to Apache trunk on a periodic basis.
> >>
> >>  You might well ask: why not just submit JIRA tickets and patches
> >> directly, esp. because this twitter team has a committer?  The
> reasoning I
> >> had was one of expedience and safety: there are modifications and
> >> improvements which I have wanted available in our internal build (which
> >> pulls from our corp maven repo), but still haven't undergone solid
> >> testing.
> >>
> >>  I could apply patches to a particular trunk svn rev, and deploy that
> >> internally (like lots of places have "hadoop-0.20.3+patch5" and we have
> >> patched pig, etc), but a) I like being able to just commit to a gitrepo,
> >> pull in changes, iterate, test, cut a release tag, push immediately into
> >> maven for consumption by appropriate internal projects; and b) I wanted
> it
> >> out in the open to keep myself honest: doing it internally would open
> the
> >> possibility of accidentally mixing private and public code, and also,
> if I
> >> get lazy and don't contribute the code back to trunk, anyone else is
> free
> >> to generate a patch and do it themselves (c.f. slowness of getting
> >> HBase/HDFS fixes out of Facebook, historically).
> >>
> >>  Right now, twitter's fork is primarily focused around LDA / topic
> >> modeling work, but recently I've been also working on a nice little
> jruby
> >> REPL wrapper.  Currently it only supports loading SequenceFiles of
> >> dictionaries and Vectors into memory and running LDA inference and
> >> introspecting on the models themself.  Invokable via
> >> "$MAHOUT-HOME/bin/mahout console" if you have JRUBY-HOME defined.  That
> >> console provides a WAY faster way to inspect models, vectors, etc, and
> in
> >> fact would be a great place to launch jobs from, if we take the approach
> >> mentioned recently of having the run() method of AbstractJob be async,
> and
> >> return a handle on the current running state of the job.  Then you could
> >> start up a console in screen, launch your job, and check in on it.
> >>
> >>  Not to threadjack, but if we're talking about forks, commercial
> >> development and so forth, I thought now was as good a time as any to
> talk
> >> about this!
> >>
> >>  -jake
> >>
> >> On Apr 4, 2012 2:36 PM, "Ted Dunning" <[email protected]> wrote:
> >>
> >> With this announcement, this group has a fork in the road facing us.
> >>
> >> We can choose the Hadoop path of forcibly excluding anybody with a
> slightly
> >> wrong commercial taint from discussions (I call this the "more GNU than
> >> GNU" philosophy).
> >>
> >> Or we can choose a real community based approach that includes vendors
> >> regardless of how they use the code that we freely give away via the
> Apache
> >> Mahout project (I call this "the Apache way").
> >>
> >> As you may guess from the way that I phrase these options, I would
> prefer
> >> the second approach.
> >>
> >> As such, I like it if we could resolve as a group that we very much
> welcome
> >> what Sean is doing as an augmentation rather than diminution of the
> major
> >> role that he has played in Mahout so far.  More than that, I would like
> to
> >> go on record saying that I, at least, am happy to have all kinds of
> >> participation in Mahout.
> >>
> >> Is this the consensus here?  I think it is important to bring this
> subject
> >> up early and get a definitive consensus rather than let it drift.
> >>
> >> On Wed, Apr 4, 2012 at 12:33 PM, Sean Owen <[email protected]> wrote: >
> Dear
> >> all -- I've long pro...
>
>

Reply via email to