+1 to commercial involvement. "Commercial" coming from folks like Sean
or Jake translates to me as "productive" and "large scale" and
"realistic applications", not  (just) rebranding and marketing.

Projects like Mahout need a commercial spin to prove and sustain their
applicability for realistic problems. Otherwise they become Weka-ish
(not that I have anything against Weka -- I like it, but I think it's
not really useful beyond spike prototypes, experimenting and
research).

Dawid

On Thu, Apr 5, 2012 at 1:18 AM, Jake Mannix <[email protected]> wrote:
> +1 to everything Ted said.
>
>  As an added point, while we're on the subject of corporate involvement,
> forks, and extensions of Mahout, now is as good a time as any to announce
> that I (and my teammate Andy Schlaikjer) are maintaining a official
> "Twitter fork" of Mahout (hosted and worked on entirely in the open on
> GitHub: http://github.com/twitter/mahout ), which we'll be making patches
> off of to submit back to Apache trunk on a periodic basis.
>
>  You might well ask: why not just submit JIRA tickets and patches
> directly, esp. because this twitter team has a committer?  The reasoning I
> had was one of expedience and safety: there are modifications and
> improvements which I have wanted available in our internal build (which
> pulls from our corp maven repo), but still haven't undergone solid
> testing.
>
>  I could apply patches to a particular trunk svn rev, and deploy that
> internally (like lots of places have "hadoop-0.20.3+patch5" and we have
> patched pig, etc), but a) I like being able to just commit to a gitrepo,
> pull in changes, iterate, test, cut a release tag, push immediately into
> maven for consumption by appropriate internal projects; and b) I wanted it
> out in the open to keep myself honest: doing it internally would open the
> possibility of accidentally mixing private and public code, and also, if I
> get lazy and don't contribute the code back to trunk, anyone else is free
> to generate a patch and do it themselves (c.f. slowness of getting
> HBase/HDFS fixes out of Facebook, historically).
>
>  Right now, twitter's fork is primarily focused around LDA / topic
> modeling work, but recently I've been also working on a nice little jruby
> REPL wrapper.  Currently it only supports loading SequenceFiles of
> dictionaries and Vectors into memory and running LDA inference and
> introspecting on the models themself.  Invokable via
> "$MAHOUT-HOME/bin/mahout console" if you have JRUBY-HOME defined.  That
> console provides a WAY faster way to inspect models, vectors, etc, and in
> fact would be a great place to launch jobs from, if we take the approach
> mentioned recently of having the run() method of AbstractJob be async, and
> return a handle on the current running state of the job.  Then you could
> start up a console in screen, launch your job, and check in on it.
>
>  Not to threadjack, but if we're talking about forks, commercial
> development and so forth, I thought now was as good a time as any to talk
> about this!
>
>  -jake
>
> On Apr 4, 2012 2:36 PM, "Ted Dunning" <[email protected]> wrote:
>
> With this announcement, this group has a fork in the road facing us.
>
> We can choose the Hadoop path of forcibly excluding anybody with a slightly
> wrong commercial taint from discussions (I call this the "more GNU than
> GNU" philosophy).
>
> Or we can choose a real community based approach that includes vendors
> regardless of how they use the code that we freely give away via the Apache
> Mahout project (I call this "the Apache way").
>
> As you may guess from the way that I phrase these options, I would prefer
> the second approach.
>
> As such, I like it if we could resolve as a group that we very much welcome
> what Sean is doing as an augmentation rather than diminution of the major
> role that he has played in Mahout so far.  More than that, I would like to
> go on record saying that I, at least, am happy to have all kinds of
> participation in Mahout.
>
> Is this the consensus here?  I think it is important to bring this subject
> up early and get a definitive consensus rather than let it drift.
>
> On Wed, Apr 4, 2012 at 12:33 PM, Sean Owen <[email protected]> wrote: > Dear
> all -- I've long pro...

Reply via email to