On Thu, Feb 5, 2015 at 1:14 AM, Gokhan Capan <gkhn...@gmail.com> wrote:
> What I am saying is that for certain algorithms including both > engine-specific (such as aggregation) and DSL stuff, what is the best way > of handling them? > > i) should we add the distributed operations to Mahout codebase as it is > proposed in #62? > Imo this can't go very well and very far (because of the engine specifics) but i'd be willing to see an experiment with simple things like map and reduce. Bigger quesitons are, where exactly we'll have to stop (we can't abstract all capabilities out there becuase of "common denominator" issues), and what percentage of methods will it truly allow to migrate to full backend portability. And if after doing all this, we will still find ourselves writing engine specific mixes, why bother. Wouldn't it be better to find a good, easy-to-replicate, incrementally-developed pattern to register and apply engine-specific strategies for every method? > > ii) should we have [engine]-ml modules (like spark-bindings and > h2o-bindings) where we can mix the DSL and engine-specific stuff? > This is not quite what i am proposing. Rather, engine-ml modules holding engine-specific _parts_ of algorithm. However, this really needs a POC over a guniea pig (similarly to how we POC'd algebra in the first place with ssvd and spca). > >