Current DSL is limited to algebraic operations only. Obviously, all methods can come into 3 categories:
(a) those that can be implemented 100% distributed or in-core algebra; (b) those that can be partially implemented using distributed or in-core algebra; (c) those that do not use algebra (either in-core, i.e. basic notion of local matrices or vectors or distributed matrices) at all. implementations in (a) category are done without spark dependencies at all and therefore are assumed to be 100% portable among Mahout back-ends. (not much there right now to choose from though). They go into math-scala module. We also taking on things in (b) category with assumption that these methods are at least partially portable at the moment. E.g. MAHOUT-1365 is one example. It is heavily invested in both algebra and message exchange. Note that this would include methods that do any block-wise linear algebra, even if in form of simple in-memory vectors and perhaps matrix types found in mahout-math (extension of Colt library). As it stands we don't have purely (c) methods and indeed i believe these methods may be totally engine-specific in which case mllib is one of possibly good homes for them. However, i am yet to find a method that is totally vector-free (even probabilistic fitting operates with vectors and matrix blocks). On Tue, Jun 17, 2014 at 5:59 PM, Andy Twigg <andy.tw...@gmail.com> wrote: > Hi Sebastian - sorry about the lack of activity here. I've looked at > the scala dsl, but I think it makes more sense to push this work into > MLLib as it really relies on spark streaming and RDDs. I'm not how you > would build the streaming abstraction within the current DSL setup. > Let me know if I'm missing something. > > On 17 May 2014 23:23, Sebastian Schelter (JIRA) <j...@apache.org> wrote: > > > > [ > https://issues.apache.org/jira/browse/MAHOUT-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > > > Sebastian Schelter resolved MAHOUT-1153. > > ---------------------------------------- > > > > Resolution: Won't Fix > > > > no activity for more than a month > > > >> Implement streaming random forests > >> ---------------------------------- > >> > >> Key: MAHOUT-1153 > >> URL: https://issues.apache.org/jira/browse/MAHOUT-1153 > >> Project: Mahout > >> Issue Type: New Feature > >> Components: Classification > >> Reporter: Andy Twigg > >> Labels: features > >> Fix For: 1.0 > >> > >> > >> The current random forest implementations are in-core and not scalable. > This issue is to add an out-of-core, scalable, streaming implementation. > Initially it could be based on [1], and using mappers in a master-worker > style. > >> [1] > http://jmlr.csail.mit.edu/papers/volume11/ben-haim10a/ben-haim10a.pdf > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.2#6252) >