as far as i understand, the flexibility idea there is to use streaming
processing like what author calls foldable functor. Is that what you want
to do? Do you want to repeat that functional API?

On Tue, Sep 29, 2015 at 2:17 PM, alxsmac733 . <[email protected]>
wrote:

> Hi Dmitriy,
>
> Apologies for not getting back to you sooner - I just got married and was
> away on my honeymoon.
>
> Having taken a closer look at the direction you're trying to take Mahout,
> while I agree that the approach extolled in the paper is not necessarily
> completely in line with batch - algebraic problems, I believe it is in a
> similar spirit. Additionally, I think having algebraic semantics for things
> like models fits in well with the goal of making Mahout more of a
> programming environment than a collection of blackbox algorithms.
>
> In terms of what specific additions should be made, I'm open in terms of
> suggestions and I'd love to discuss the matter further. Per your point
> about low-level speedups, unfortunately I'm not a JVM expert so I probably
> couldn't help too much on that front.
>
> - Alex
> On Sep 14, 2015 2:00 PM, "Dmitriy Lyubimov" <[email protected]> wrote:
>
> > Also. as far as i understand, the author does a lot in terms of low-level
> > speed ups -- using fast numeric libraries, packing memory-fragmented
> object
> > trees into continuous cache-friendly representations (something i fought
> > for years in java, and then in part in Scala -- this is my JVM rant # 1).
> > Mahout notoriously lacks these techniques. But without these techniques,
> > the speed-ups are probably not realistic by the monoid architecture alone
> > (i may be wrong). What are your thoughts in these respects? All these
> > problems are very welcome to be solved in Mahout. But I expect they'd
> > require some significant time commitment IMO.
> >
> > On Mon, Sep 14, 2015 at 10:41 AM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > Alex,
> > >
> > > so these papers seem to mainly show adaptation of different algorithms
> to
> > > a monoid architecture, i.e. online training (or parallel online
> > training).
> > > Although IMO these makes it not necessarily batch-algebraic problems
> > > towards which we were working recently (i.e. "distributed R" notion), I
> > > suppose they would make a fine architecture addition on their own.
> > >
> > > What parts of Mahout you are suggesting to reuse for these methods?
> > > Also, the papers show adaptation for several classifiers, which ones do
> > > you suggest to start with?
> > >
> > > Thank you for doing this.
> > >
> > > -D
> > >
> > > On Fri, Sep 4, 2015 at 2:41 PM, alxsmac733 . <[email protected]>
> > > wrote:
> > >
> > >> My pleasure!
> > >>
> > >> On Fri, Sep 4, 2015 at 4:03 PM, Andrew Musselman <
> > >> [email protected]
> > >> > wrote:
> > >>
> > >> > Thanks Alex; grateful for the help.
> > >> >
> > >> > On Fri, Sep 4, 2015 at 12:59 PM, alxsmac733 . <
> [email protected]
> > >
> > >> > wrote:
> > >> >
> > >> > > Hi Dmitriy,
> > >> > >
> > >> > > That sounds more than reasonable - take as much time as you need.
> > >> I'll
> > >> > be
> > >> > > away for the next two weeks anyway so I won't be able to start
> > >> working on
> > >> > > this until I get back should you want me to move forward with the
> > >> > proposal.
> > >> > >
> > >> > > - Alex
> > >> > > On Sep 4, 2015 1:46 PM, "Dmitriy Lyubimov" <[email protected]>
> > wrote:
> > >> > >
> > >> > > > Alex,
> > >> > > >
> > >> > > > can you give us a week or so to look it over?
> > >> > > >
> > >> > > > We have been discussing for a while hyperparameter fitting
> > >> approaches
> > >> > and
> > >> > > > it is fairly high on our roadmap (crossvalidation is of course
> an
> > >> > > important
> > >> > > > element of it). We need to figure how it may fit together; but
> > don't
> > >> > get
> > >> > > > discouraged if we don't get immediately back to you, we need
> time
> > to
> > >> > > digest
> > >> > > > your proposal.
> > >> > > >
> > >> > > > -d
> > >> > > >
> > >> > > > On Fri, Sep 4, 2015 at 10:26 AM, alxsmac733 . <
> > >> [email protected]>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > The fast cross-validation algorithm might be a good place to
> > >> start as
> > >> > > it
> > >> > > > > may be the most broadly useful.
> > >> > > > >
> > >> > > > > Any advice on how to get started would be greatly appreciated
> -
> > I
> > >> > want
> > >> > > to
> > >> > > > > make sure I do a good job and it fits well with the overall
> aims
> > >> of
> > >> > > > Mahout.
> > >> > > > >
> > >> > > > > On Fri, Sep 4, 2015 at 1:12 PM, Andrew Musselman <
> > >> > > > > [email protected]
> > >> > > > > > wrote:
> > >> > > > >
> > >> > > > > > Sounds interesting; what part would you like to start with?
> > >> > > > > >
> > >> > > > > > If you need help getting started we're happy to point you
> in a
> > >> good
> > >> > > > > > direction.
> > >> > > > > >
> > >> > > > > > On Fri, Sep 4, 2015 at 9:55 AM, alxsmac733 . <
> > >> > [email protected]
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi everyone,
> > >> > > > > > >
> > >> > > > > > > Would there be any interest in adding algebraic
> > classification
> > >> > > > methods
> > >> > > > > to
> > >> > > > > > > Mahout?  It's an elegant approach that allows for easy
> > online
> > >> and
> > >> > > > > > parallel
> > >> > > > > > > training as well as fast cross-validation.  Below are some
> > >> links
> > >> > > > > > describing
> > >> > > > > > > the approach as well as an existing Haskell package
> > >> implemented
> > >> > by
> > >> > > > the
> > >> > > > > > > author.  The first paper does a very good job of
> explaining
> > >> the
> > >> > > basic
> > >> > > > > > > concepts clearly and concisely.
> > >> > > > > > >
> > >> > > > > > >
> > >> > >
> https://izbicki.me/public/papers/icml2013-algebraic-classifiers.pdf
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://izbicki.me/public/papers/tfp2013-hlearn-a-machine-learning-library-for-haskell.pdf
> > >> > > > > > > https://izbicki.me/
> > >> > > > > > > https://github.com/mikeizbicki/HLearn
> > >> > > > > > >
> > >> > > > > > > The author saw a very large speed up implementing these
> > >> > techniques
> > >> > > > when
> > >> > > > > > > compared with popular existing libraries such as Weka.
> > Aside
> > >> > from
> > >> > > > the
> > >> > > > > > > potential performance gains to be had, I think imposing
> > >> algebraic
> > >> > > > > > structure
> > >> > > > > > > provides a nice layer of abstraction over the particular
> > >> models
> > >> > > being
> > >> > > > > > > implemented.
> > >> > > > > > >
> > >> > > > > > > I'd love to hear everyone's feedback on this.  Thanks for
> > your
> > >> > time
> > >> > > > and
> > >> > > > > > > enjoy your weekends!
> > >> > > > > > >
> > >> > > > > > > Alex Moreno
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to