Lloyd K-means iteration is possible in pure algebraic form. I published a
fragment of the code on this list at some point. I can probably do archive
search and dig it out.

Various algorithm inducing techniques are probably quasi algebraic. Which
means there will be some things that cannot be expressed in R-like algebra
and will have to be separated into an engine-specific component.

I also doubt (really a lot) the wisdom of bringing disitributed reduce to
matrices api.

First, any such attempt (such as introduction of a reduce) will be
crippling w.r.t actual engine capabilities (i.e. in Spark there are
probably couple of dozen distributed primitives beyond reduce; and reduce
in my experience is far from being the one most frequently used.) For
example, implicit ALS paradigm is not most efficient using Spark reduce (if
at all possible).

Second, when you do distributed operations, being in Matrix world (i.e.
dealing with fixed row vector type) is also very constraining. Most of the
time you actually don't want arguments of say bagel operations to be
constrained to a single vector.

Third, bringing all engine capabilities under same common denominator is
challenging at best. Either way, we end up either robbing ourselves of
engine real capabilities (per above) or emulating them which would lead to
unnatural utilization and/or low performing algorithm.

Fourth, nor it is really necessary. More likely, a huge strategy (like what
we have for DistributedEngine strategy) of doing some frequent high level
things in a particular engine -- even if they are specific to an algorithm
-- is far more pallatable way of abstracting these things (even if there's
need to go abstract at all).

There are also a lot of issues with Colt in-memory performance.



On Sun, Sep 28, 2014 at 1:15 PM, Aamir Khan <[email protected]> wrote:

> Hi,
>
> I am also new to Apache and Mahout. This thread caught my attention.
> Can you tell what are the areas where development is required.
> Is there any work on *Clustering*?
> Any guidance on how to start and useful links are highly appreciated.
>
> Many thanks,
>
>
> On Mon, Sep 29, 2014 at 1:19 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Thejas,
> >
> > What were your impressions?
> >
> > Which parts of the system match your background and capabilities?
> >
> >
> >
> > On Sun, Sep 28, 2014 at 11:46 AM, Thejas Prasad <[email protected]>
> > wrote:
> >
> > > Hey suneel,
> > >
> > > I finished reading the paper.  What's next?
> > >
> > > Sent from my iPhone
> > >
> > > > On Sep 26, 2014, at 7:04 PM, Suneel Marthi <[email protected]>
> wrote:
> > > >
> > > > See this for a start
> > > > http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> > > >
> > > >
> > > >> On Fri, Sep 26, 2014 at 8:02 PM, thejas prasad <[email protected]
> >
> > > wrote:
> > > >>
> > > >> what exactly in the  scala math library?
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Sep 26, 2014 at 1:00 AM, Ted Dunning <[email protected]
> >
> > > >> wrote:
> > > >>
> > > >>> Got it!
> > > >>>
> > > >>> Sorry to be dense.
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Thu, Sep 25, 2014 at 4:23 PM, Thejas Prasad <
> [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>>> Sorry I meant to say what is the best way to get started**?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Thejas
> > > >>>> Sent from my iPhone
> > > >>>>
> > > >>>>> On Sep 25, 2014, at 4:28 PM, Ted Dunning <[email protected]>
> > > >>> wrote:
> > > >>>>>
> > > >>>>>> On Thu, Sep 25, 2014 at 9:35 AM, Thejas Prasad <
> > [email protected]
> > > >>>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>> what is the best way to get statues
> > > >>>>>
> > > >>>>>
> > > >>>>> Hmmm....
> > > >>>>>
> > > >>>>> I am totally confused.  You must have meant something here.
> > > >>>>>
> > > >>>>> Regarding your next question, the place to start work is on the
> > scala
> > > >>>> math
> > > >>>>> library.
> > > >>
> > >
> >
>

Reply via email to