On Mon, Aug 25, 2014 at 3:23 PM, Andrew Palumbo <[email protected]> wrote:

> Thanks Dmitriy,
>
> I've added in SSVD, PCA, QR and Weighted ALS.


I think it is called "regularized ALS"


> To keep it simple,  I'll leave them under Spark for right now. (and add
> "in development" for h2o) since they're in and passing tests.
>
> Should I add:
>

no

>
> GP-EI
> BFGS
>
> as "in development"
>
> bigram co-occurrence (would this be collocations?)
>
> as "in development" for spark?
>
>
>
>
> > Date: Mon, 25 Aug 2014 14:40:57 -0700
> > Subject: Re: Features by engine page
> > From: [email protected]
> > To: [email protected]
> >
> > yes SSVD and stochastic PCA as well as thin QR are re-cast in Mahout
> > algebra (meaning they are engine-independent, not just spark).
> >
> > So is regularized ALS (albeit perhaps somewhat naive and thus affecting
> > performance).
> >
> > I also had quasi algebraic implicit feedback ALS (which is in fact
> implicit
> > feedback paper and ALS-WR in the same bottle) but closed the issue due to
> >  lack of reviews and interest.
> >
> > Internally I also have framework for doing hyper parameter searches and
> > right now am closing on GP-EI which will probably benefit from some
> > additions doing estimates chosen by reducing uncertainty (attempts to get
> > out of local minimum projected by GP-EI Snoek's algorithm itself). I
> hope i
> > could open it one day. This work is obviously also interesting in that it
> > establishes probabilistic framework in Mahout (distributions & gaussian
> > process). GP stuff can  be also used to evaluate things like RLFM i
> think.
> >
> > I also have framework to do line search type of things, including big
> > datasets, per Nosedal and Wright, incluging BFGS, those are probably also
> > candidates for contribution. Or not, depending on the moods of my new
> boss.
> >
> > Of other interesting things that are done with DSL and may be considered
> > for contribution, I also have implementations for bigram co-occurrence
> > (both directed and undirected) made in the DSL but it is also
> > quasi-algebraic i think (meaning there are Spark-specific parts). This is
> > (I think) would also include truethful implementation of Surprise &
> > Coincidence's paper bigram problem (currently implemented in Mahout MR)
> but
> > also would estimate undirected co-occurrences (as a frequent itemsets
> > problem solver/replacement). Again, hopeful it may be contributed, but
> not
> > sure if i'll pursue that if there's lack of interest in my company. It's
> > hard to go against the wind, in a way.
> >
> > By far the most often missing piece is data prep of course, but i think i
> > can eventually contribute a couple tutorials of how to do vectorization
> > using SparkQL stuff.
> >
> >
> >
> > -d
> >
> >
> >
> >
> > On Mon, Aug 25, 2014 at 2:19 PM, Pat Ferrel <[email protected]>
> wrote:
> >
> > > Spark RSJ, MAHOUT-1604 is in development
> > >
> > > I thought SSVD with PCA was working on Spark.
> > >
> > >
> > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > >
> > > this is super-cool to hear.
> > >
> > >
> > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann <[email protected]>
> > > wrote:
> > >
> > > > Hi Andrew,
> > > >
> > > > I like the overview of the different algorithms. The Flink bindings
> are
> > > > still under development. We hope to finish them in the next couple of
> > > > weeks.
> > > >
> > > > Best regards,
> > > >
> > > > Till
> > > >
> > > >
> > > > On Mon, Aug 25, 2014 at 9:17 PM, Andrew Palumbo <[email protected]>
> > > > wrote:
> > > >
> > > >> I created a "Features by Engine" table from the Mahout "List of
> > > >> Algorithms" page which I'd like to add to the Mahout site once it
> looks
> > > >> good:
> > > >>
> > > >> https://andrewpalumbo.github.io/algorithms_by_engine
> > > >>
> > > >> I just copied over the current page, and added in some of the stuff
> that
> > > > i
> > > >> know is complete/in the works.  I wasn't sure about some of the
> > > >> Collaborative filtering stuff.
> > > >>
> > > >> Maybe the whole thing needs to be organized differently?  A seperate
> > > >> totally  abstract section for algorithms that will be sitting in
> > > > math-scala
> > > >> and then a section for each engine's implementation?
> > > >>
> > > >> Also I know that there's been some work done on Flink bindings, but
> I
> > > >> don't see a specific Jira.  Should I put Filnk down as "In
> development"?
> > > >>
> > > >> Any thoughts are appreciated.
> > > >>
> > > >>
> > > >>
> > > >
> > >
> > >
>
>

Reply via email to