Re: Codebase refactoring proposal

Pat Ferrel Thu, 05 Feb 2015 07:43:35 -0800

From my own perspective:

I’m not aware of any rule to make all operations agnostic. In fact several 
engine specific exceptions are discussed in this long email. We’ve talked about 
reduce or join operations that would be difficult to make agnostic without a 
lot of knowledge of ALL other engines. Unless or until we get contributors from 
those engines reviewing commits, why put this burden on all of us?

An agnostic DSL was for linear algebra ops, not all distributed computation 
methods. We aren’t doing a generic engine only engine agnostic algebra. 

You have added stubs in H2O for the distributed aggregations. This seems fine 
but I wouldn’t vote to require that. If GSGD requires further use of Spark 
specific operations, so be it. This means that GSGD may live in the Spark 
module with any algebra bits required  added to math-scala. Does anyone have a 
problem with that?

My vote on #62—ship it.

On the point of interoperability with MLlib we still need talk about that but 
another email.

On Feb 5, 2015, at 1:14 AM, Gokhan Capan <gkhn...@gmail.com> wrote:

What I am saying is that for certain algorithms including both
engine-specific (such as aggregation) and DSL stuff, what is the best way
of handling them?

i) should we add the distributed operations to Mahout codebase as it is
proposed in #62?

ii) should we have [engine]-ml modules (like spark-bindings and
h2o-bindings) where we can mix the DSL and engine-specific stuff?

Picking i. has the advantage of writing an ML-algorithm once and then it
can be run on alternative engines, but it requires wrapping/duplicating
existing distributed operations.

Picking ii. has the advantage of avoiding writing distributed operations,
but since we're mixing the DSL and the engine-specific stuff, an
ML-algorithm written for an engine would not be available for the others.

I just wanted to hear some opinions.

Gokhan

On Thu, Feb 5, 2015 at 4:11 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> I took it Gokhan had objections himself, based on his comments. if we are
> talking about #62.
> 
> He also expressed concerns about computing GSGD but i suspect it can still
> be algebraically computed.
> 
> On Wed, Feb 4, 2015 at 5:52 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
> 
>> BTW Ted and Andrew have both expressed interest in the distributed
>> aggregation stuff. It sounds like we are agreeing that
>> non-algebra—computation method type things can be engine specific.
>> 
>> So does anyone have an objection to Gokhan pushing his PR?
>> 
>> On Feb 4, 2015, at 2:20 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>> 
>> On Wed, Feb 4, 2015 at 1:51 PM, Andrew Palumbo <ap....@outlook.com>
> wrote:
>> 
>>> 
>>> 
>>> 
>>> My thought was not to bring primitive engine specific aggregetors,
>>> combiners,  etc. into math-scala.
>>> 
>> 
>> Yeah. +1. I would like to support that as an experiment, see where it
> goes.
>> Clearly some distributed use cases are simple enough while also pervasive
>> enough.
>> 
>> 
>

Re: Codebase refactoring proposal

Reply via email to