I also appreciate your project, Sean. I think it's crucial to have
'battle-tested' code flow back into Mahout and commercial software on
top of open source increases the visibility of the open source project.

--sebastian

On 05.04.2012 09:13, Dawid Weiss wrote:
> +1 to commercial involvement. "Commercial" coming from folks like Sean
> or Jake translates to me as "productive" and "large scale" and
> "realistic applications", not  (just) rebranding and marketing.
> 
> Projects like Mahout need a commercial spin to prove and sustain their
> applicability for realistic problems. Otherwise they become Weka-ish
> (not that I have anything against Weka -- I like it, but I think it's
> not really useful beyond spike prototypes, experimenting and
> research).
> 
> Dawid
> 
> On Thu, Apr 5, 2012 at 1:18 AM, Jake Mannix <[email protected]> wrote:
>> +1 to everything Ted said.
>>
>>  As an added point, while we're on the subject of corporate involvement,
>> forks, and extensions of Mahout, now is as good a time as any to announce
>> that I (and my teammate Andy Schlaikjer) are maintaining a official
>> "Twitter fork" of Mahout (hosted and worked on entirely in the open on
>> GitHub: http://github.com/twitter/mahout ), which we'll be making patches
>> off of to submit back to Apache trunk on a periodic basis.
>>
>>  You might well ask: why not just submit JIRA tickets and patches
>> directly, esp. because this twitter team has a committer?  The reasoning I
>> had was one of expedience and safety: there are modifications and
>> improvements which I have wanted available in our internal build (which
>> pulls from our corp maven repo), but still haven't undergone solid
>> testing.
>>
>>  I could apply patches to a particular trunk svn rev, and deploy that
>> internally (like lots of places have "hadoop-0.20.3+patch5" and we have
>> patched pig, etc), but a) I like being able to just commit to a gitrepo,
>> pull in changes, iterate, test, cut a release tag, push immediately into
>> maven for consumption by appropriate internal projects; and b) I wanted it
>> out in the open to keep myself honest: doing it internally would open the
>> possibility of accidentally mixing private and public code, and also, if I
>> get lazy and don't contribute the code back to trunk, anyone else is free
>> to generate a patch and do it themselves (c.f. slowness of getting
>> HBase/HDFS fixes out of Facebook, historically).
>>
>>  Right now, twitter's fork is primarily focused around LDA / topic
>> modeling work, but recently I've been also working on a nice little jruby
>> REPL wrapper.  Currently it only supports loading SequenceFiles of
>> dictionaries and Vectors into memory and running LDA inference and
>> introspecting on the models themself.  Invokable via
>> "$MAHOUT-HOME/bin/mahout console" if you have JRUBY-HOME defined.  That
>> console provides a WAY faster way to inspect models, vectors, etc, and in
>> fact would be a great place to launch jobs from, if we take the approach
>> mentioned recently of having the run() method of AbstractJob be async, and
>> return a handle on the current running state of the job.  Then you could
>> start up a console in screen, launch your job, and check in on it.
>>
>>  Not to threadjack, but if we're talking about forks, commercial
>> development and so forth, I thought now was as good a time as any to talk
>> about this!
>>
>>  -jake
>>
>> On Apr 4, 2012 2:36 PM, "Ted Dunning" <[email protected]> wrote:
>>
>> With this announcement, this group has a fork in the road facing us.
>>
>> We can choose the Hadoop path of forcibly excluding anybody with a slightly
>> wrong commercial taint from discussions (I call this the "more GNU than
>> GNU" philosophy).
>>
>> Or we can choose a real community based approach that includes vendors
>> regardless of how they use the code that we freely give away via the Apache
>> Mahout project (I call this "the Apache way").
>>
>> As you may guess from the way that I phrase these options, I would prefer
>> the second approach.
>>
>> As such, I like it if we could resolve as a group that we very much welcome
>> what Sean is doing as an augmentation rather than diminution of the major
>> role that he has played in Mahout so far.  More than that, I would like to
>> go on record saying that I, at least, am happy to have all kinds of
>> participation in Mahout.
>>
>> Is this the consensus here?  I think it is important to bring this subject
>> up early and get a definitive consensus rather than let it drift.
>>
>> On Wed, Apr 4, 2012 at 12:33 PM, Sean Owen <[email protected]> wrote: > Dear
>> all -- I've long pro...

Reply via email to