Also. as far as i understand, the author does a lot in terms of low-level
speed ups -- using fast numeric libraries, packing memory-fragmented object
trees into continuous cache-friendly representations (something i fought
for years in java, and then in part in Scala -- this is my JVM rant # 1).
Mahout notoriously lacks these techniques. But without these techniques,
the speed-ups are probably not realistic by the monoid architecture alone
(i may be wrong). What are your thoughts in these respects? All these
problems are very welcome to be solved in Mahout. But I expect they'd
require some significant time commitment IMO.

On Mon, Sep 14, 2015 at 10:41 AM, Dmitriy Lyubimov <[email protected]>
wrote:

> Alex,
>
> so these papers seem to mainly show adaptation of different algorithms to
> a monoid architecture, i.e. online training (or parallel online training).
> Although IMO these makes it not necessarily batch-algebraic problems
> towards which we were working recently (i.e. "distributed R" notion), I
> suppose they would make a fine architecture addition on their own.
>
> What parts of Mahout you are suggesting to reuse for these methods?
> Also, the papers show adaptation for several classifiers, which ones do
> you suggest to start with?
>
> Thank you for doing this.
>
> -D
>
> On Fri, Sep 4, 2015 at 2:41 PM, alxsmac733 . <[email protected]>
> wrote:
>
>> My pleasure!
>>
>> On Fri, Sep 4, 2015 at 4:03 PM, Andrew Musselman <
>> [email protected]
>> > wrote:
>>
>> > Thanks Alex; grateful for the help.
>> >
>> > On Fri, Sep 4, 2015 at 12:59 PM, alxsmac733 . <[email protected]>
>> > wrote:
>> >
>> > > Hi Dmitriy,
>> > >
>> > > That sounds more than reasonable - take as much time as you need.
>> I'll
>> > be
>> > > away for the next two weeks anyway so I won't be able to start
>> working on
>> > > this until I get back should you want me to move forward with the
>> > proposal.
>> > >
>> > > - Alex
>> > > On Sep 4, 2015 1:46 PM, "Dmitriy Lyubimov" <[email protected]> wrote:
>> > >
>> > > > Alex,
>> > > >
>> > > > can you give us a week or so to look it over?
>> > > >
>> > > > We have been discussing for a while hyperparameter fitting
>> approaches
>> > and
>> > > > it is fairly high on our roadmap (crossvalidation is of course an
>> > > important
>> > > > element of it). We need to figure how it may fit together; but don't
>> > get
>> > > > discouraged if we don't get immediately back to you, we need time to
>> > > digest
>> > > > your proposal.
>> > > >
>> > > > -d
>> > > >
>> > > > On Fri, Sep 4, 2015 at 10:26 AM, alxsmac733 . <
>> [email protected]>
>> > > > wrote:
>> > > >
>> > > > > The fast cross-validation algorithm might be a good place to
>> start as
>> > > it
>> > > > > may be the most broadly useful.
>> > > > >
>> > > > > Any advice on how to get started would be greatly appreciated - I
>> > want
>> > > to
>> > > > > make sure I do a good job and it fits well with the overall aims
>> of
>> > > > Mahout.
>> > > > >
>> > > > > On Fri, Sep 4, 2015 at 1:12 PM, Andrew Musselman <
>> > > > > [email protected]
>> > > > > > wrote:
>> > > > >
>> > > > > > Sounds interesting; what part would you like to start with?
>> > > > > >
>> > > > > > If you need help getting started we're happy to point you in a
>> good
>> > > > > > direction.
>> > > > > >
>> > > > > > On Fri, Sep 4, 2015 at 9:55 AM, alxsmac733 . <
>> > [email protected]
>> > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi everyone,
>> > > > > > >
>> > > > > > > Would there be any interest in adding algebraic classification
>> > > > methods
>> > > > > to
>> > > > > > > Mahout?  It's an elegant approach that allows for easy online
>> and
>> > > > > > parallel
>> > > > > > > training as well as fast cross-validation.  Below are some
>> links
>> > > > > > describing
>> > > > > > > the approach as well as an existing Haskell package
>> implemented
>> > by
>> > > > the
>> > > > > > > author.  The first paper does a very good job of explaining
>> the
>> > > basic
>> > > > > > > concepts clearly and concisely.
>> > > > > > >
>> > > > > > >
>> > > https://izbicki.me/public/papers/icml2013-algebraic-classifiers.pdf
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://izbicki.me/public/papers/tfp2013-hlearn-a-machine-learning-library-for-haskell.pdf
>> > > > > > > https://izbicki.me/
>> > > > > > > https://github.com/mikeizbicki/HLearn
>> > > > > > >
>> > > > > > > The author saw a very large speed up implementing these
>> > techniques
>> > > > when
>> > > > > > > compared with popular existing libraries such as Weka.  Aside
>> > from
>> > > > the
>> > > > > > > potential performance gains to be had, I think imposing
>> algebraic
>> > > > > > structure
>> > > > > > > provides a nice layer of abstraction over the particular
>> models
>> > > being
>> > > > > > > implemented.
>> > > > > > >
>> > > > > > > I'd love to hear everyone's feedback on this.  Thanks for your
>> > time
>> > > > and
>> > > > > > > enjoy your weekends!
>> > > > > > >
>> > > > > > > Alex Moreno
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to