Current DSL is limited to algebraic operations only. Obviously, all methods
can come into 3 categories:

(a) those that can be implemented  100% distributed or in-core algebra;
(b) those that can be partially implemented using distributed or in-core
algebra;
(c) those that do not use algebra (either in-core, i.e. basic notion of
local matrices or vectors or distributed matrices) at all.

implementations in (a) category are done without spark dependencies at all
and therefore are assumed to be 100% portable among Mahout back-ends. (not
much there right now to choose from though). They go into math-scala module.

We also taking on things in (b) category with assumption that these methods
are at least partially portable at the moment. E.g. MAHOUT-1365 is one
example. It is heavily invested in both algebra and message exchange. Note
that this would include methods that do any block-wise linear algebra, even
if in form of simple in-memory vectors and perhaps matrix types found in
mahout-math (extension of Colt library).

As it stands we don't have purely (c) methods and indeed i believe these
methods may be totally engine-specific in which case mllib is one of
possibly good homes for them. However, i am yet to find a method that is
totally vector-free (even probabilistic fitting operates with vectors and
matrix blocks).


On Tue, Jun 17, 2014 at 5:59 PM, Andy Twigg <andy.tw...@gmail.com> wrote:

> Hi Sebastian - sorry about the lack of activity here. I've looked at
> the scala dsl, but I think it makes more sense to push this work into
> MLLib as it really relies on spark streaming and RDDs. I'm not how you
> would build the streaming abstraction within the current DSL setup.
> Let me know if I'm missing something.
>
> On 17 May 2014 23:23, Sebastian Schelter (JIRA) <j...@apache.org> wrote:
> >
> >      [
> https://issues.apache.org/jira/browse/MAHOUT-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >
> > Sebastian Schelter resolved MAHOUT-1153.
> > ----------------------------------------
> >
> >     Resolution: Won't Fix
> >
> > no activity for more than a month
> >
> >> Implement streaming random forests
> >> ----------------------------------
> >>
> >>                 Key: MAHOUT-1153
> >>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1153
> >>             Project: Mahout
> >>          Issue Type: New Feature
> >>          Components: Classification
> >>            Reporter: Andy Twigg
> >>              Labels: features
> >>             Fix For: 1.0
> >>
> >>
> >> The current random forest implementations are in-core and not scalable.
> This issue is to add an out-of-core, scalable, streaming implementation.
> Initially it could be based on [1], and using mappers in a master-worker
> style.
> >> [1]
> http://jmlr.csail.mit.edu/papers/volume11/ben-haim10a/ben-haim10a.pdf
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.2#6252)
>

Reply via email to