From my own perspective: I’m not aware of any rule to make all operations agnostic. In fact several engine specific exceptions are discussed in this long email. We’ve talked about reduce or join operations that would be difficult to make agnostic without a lot of knowledge of ALL other engines. Unless or until we get contributors from those engines reviewing commits, why put this burden on all of us?
An agnostic DSL was for linear algebra ops, not all distributed computation methods. We aren’t doing a generic engine only engine agnostic algebra. You have added stubs in H2O for the distributed aggregations. This seems fine but I wouldn’t vote to require that. If GSGD requires further use of Spark specific operations, so be it. This means that GSGD may live in the Spark module with any algebra bits required added to math-scala. Does anyone have a problem with that? My vote on #62—ship it. On the point of interoperability with MLlib we still need talk about that but another email. On Feb 5, 2015, at 1:14 AM, Gokhan Capan <gkhn...@gmail.com> wrote: What I am saying is that for certain algorithms including both engine-specific (such as aggregation) and DSL stuff, what is the best way of handling them? i) should we add the distributed operations to Mahout codebase as it is proposed in #62? ii) should we have [engine]-ml modules (like spark-bindings and h2o-bindings) where we can mix the DSL and engine-specific stuff? Picking i. has the advantage of writing an ML-algorithm once and then it can be run on alternative engines, but it requires wrapping/duplicating existing distributed operations. Picking ii. has the advantage of avoiding writing distributed operations, but since we're mixing the DSL and the engine-specific stuff, an ML-algorithm written for an engine would not be available for the others. I just wanted to hear some opinions. Gokhan On Thu, Feb 5, 2015 at 4:11 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > I took it Gokhan had objections himself, based on his comments. if we are > talking about #62. > > He also expressed concerns about computing GSGD but i suspect it can still > be algebraically computed. > > On Wed, Feb 4, 2015 at 5:52 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > >> BTW Ted and Andrew have both expressed interest in the distributed >> aggregation stuff. It sounds like we are agreeing that >> non-algebra—computation method type things can be engine specific. >> >> So does anyone have an objection to Gokhan pushing his PR? >> >> On Feb 4, 2015, at 2:20 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >> >> On Wed, Feb 4, 2015 at 1:51 PM, Andrew Palumbo <ap....@outlook.com> > wrote: >> >>> >>> >>> >>> My thought was not to bring primitive engine specific aggregetors, >>> combiners, etc. into math-scala. >>> >> >> Yeah. +1. I would like to support that as an experiment, see where it > goes. >> Clearly some distributed use cases are simple enough while also pervasive >> enough. >> >> >