Congratulations, by the way!. On Tue, Sep 29, 2015 at 3:14 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> as far as i understand, the flexibility idea there is to use streaming > processing like what author calls foldable functor. Is that what you want > to do? Do you want to repeat that functional API? > > On Tue, Sep 29, 2015 at 2:17 PM, alxsmac733 . <ajmoreno1...@gmail.com> > wrote: > >> Hi Dmitriy, >> >> Apologies for not getting back to you sooner - I just got married and was >> away on my honeymoon. >> >> Having taken a closer look at the direction you're trying to take Mahout, >> while I agree that the approach extolled in the paper is not necessarily >> completely in line with batch - algebraic problems, I believe it is in a >> similar spirit. Additionally, I think having algebraic semantics for >> things >> like models fits in well with the goal of making Mahout more of a >> programming environment than a collection of blackbox algorithms. >> >> In terms of what specific additions should be made, I'm open in terms of >> suggestions and I'd love to discuss the matter further. Per your point >> about low-level speedups, unfortunately I'm not a JVM expert so I probably >> couldn't help too much on that front. >> >> - Alex >> On Sep 14, 2015 2:00 PM, "Dmitriy Lyubimov" <dlie...@gmail.com> wrote: >> >> > Also. as far as i understand, the author does a lot in terms of >> low-level >> > speed ups -- using fast numeric libraries, packing memory-fragmented >> object >> > trees into continuous cache-friendly representations (something i fought >> > for years in java, and then in part in Scala -- this is my JVM rant # >> 1). >> > Mahout notoriously lacks these techniques. But without these techniques, >> > the speed-ups are probably not realistic by the monoid architecture >> alone >> > (i may be wrong). What are your thoughts in these respects? All these >> > problems are very welcome to be solved in Mahout. But I expect they'd >> > require some significant time commitment IMO. >> > >> > On Mon, Sep 14, 2015 at 10:41 AM, Dmitriy Lyubimov <dlie...@gmail.com> >> > wrote: >> > >> > > Alex, >> > > >> > > so these papers seem to mainly show adaptation of different >> algorithms to >> > > a monoid architecture, i.e. online training (or parallel online >> > training). >> > > Although IMO these makes it not necessarily batch-algebraic problems >> > > towards which we were working recently (i.e. "distributed R" notion), >> I >> > > suppose they would make a fine architecture addition on their own. >> > > >> > > What parts of Mahout you are suggesting to reuse for these methods? >> > > Also, the papers show adaptation for several classifiers, which ones >> do >> > > you suggest to start with? >> > > >> > > Thank you for doing this. >> > > >> > > -D >> > > >> > > On Fri, Sep 4, 2015 at 2:41 PM, alxsmac733 . <ajmoreno1...@gmail.com> >> > > wrote: >> > > >> > >> My pleasure! >> > >> >> > >> On Fri, Sep 4, 2015 at 4:03 PM, Andrew Musselman < >> > >> andrew.mussel...@gmail.com >> > >> > wrote: >> > >> >> > >> > Thanks Alex; grateful for the help. >> > >> > >> > >> > On Fri, Sep 4, 2015 at 12:59 PM, alxsmac733 . < >> ajmoreno1...@gmail.com >> > > >> > >> > wrote: >> > >> > >> > >> > > Hi Dmitriy, >> > >> > > >> > >> > > That sounds more than reasonable - take as much time as you need. >> > >> I'll >> > >> > be >> > >> > > away for the next two weeks anyway so I won't be able to start >> > >> working on >> > >> > > this until I get back should you want me to move forward with the >> > >> > proposal. >> > >> > > >> > >> > > - Alex >> > >> > > On Sep 4, 2015 1:46 PM, "Dmitriy Lyubimov" <dlie...@gmail.com> >> > wrote: >> > >> > > >> > >> > > > Alex, >> > >> > > > >> > >> > > > can you give us a week or so to look it over? >> > >> > > > >> > >> > > > We have been discussing for a while hyperparameter fitting >> > >> approaches >> > >> > and >> > >> > > > it is fairly high on our roadmap (crossvalidation is of course >> an >> > >> > > important >> > >> > > > element of it). We need to figure how it may fit together; but >> > don't >> > >> > get >> > >> > > > discouraged if we don't get immediately back to you, we need >> time >> > to >> > >> > > digest >> > >> > > > your proposal. >> > >> > > > >> > >> > > > -d >> > >> > > > >> > >> > > > On Fri, Sep 4, 2015 at 10:26 AM, alxsmac733 . < >> > >> ajmoreno1...@gmail.com> >> > >> > > > wrote: >> > >> > > > >> > >> > > > > The fast cross-validation algorithm might be a good place to >> > >> start as >> > >> > > it >> > >> > > > > may be the most broadly useful. >> > >> > > > > >> > >> > > > > Any advice on how to get started would be greatly >> appreciated - >> > I >> > >> > want >> > >> > > to >> > >> > > > > make sure I do a good job and it fits well with the overall >> aims >> > >> of >> > >> > > > Mahout. >> > >> > > > > >> > >> > > > > On Fri, Sep 4, 2015 at 1:12 PM, Andrew Musselman < >> > >> > > > > andrew.mussel...@gmail.com >> > >> > > > > > wrote: >> > >> > > > > >> > >> > > > > > Sounds interesting; what part would you like to start with? >> > >> > > > > > >> > >> > > > > > If you need help getting started we're happy to point you >> in a >> > >> good >> > >> > > > > > direction. >> > >> > > > > > >> > >> > > > > > On Fri, Sep 4, 2015 at 9:55 AM, alxsmac733 . < >> > >> > ajmoreno1...@gmail.com >> > >> > > > >> > >> > > > > > wrote: >> > >> > > > > > >> > >> > > > > > > Hi everyone, >> > >> > > > > > > >> > >> > > > > > > Would there be any interest in adding algebraic >> > classification >> > >> > > > methods >> > >> > > > > to >> > >> > > > > > > Mahout? It's an elegant approach that allows for easy >> > online >> > >> and >> > >> > > > > > parallel >> > >> > > > > > > training as well as fast cross-validation. Below are >> some >> > >> links >> > >> > > > > > describing >> > >> > > > > > > the approach as well as an existing Haskell package >> > >> implemented >> > >> > by >> > >> > > > the >> > >> > > > > > > author. The first paper does a very good job of >> explaining >> > >> the >> > >> > > basic >> > >> > > > > > > concepts clearly and concisely. >> > >> > > > > > > >> > >> > > > > > > >> > >> > > >> https://izbicki.me/public/papers/icml2013-algebraic-classifiers.pdf >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://izbicki.me/public/papers/tfp2013-hlearn-a-machine-learning-library-for-haskell.pdf >> > >> > > > > > > https://izbicki.me/ >> > >> > > > > > > https://github.com/mikeizbicki/HLearn >> > >> > > > > > > >> > >> > > > > > > The author saw a very large speed up implementing these >> > >> > techniques >> > >> > > > when >> > >> > > > > > > compared with popular existing libraries such as Weka. >> > Aside >> > >> > from >> > >> > > > the >> > >> > > > > > > potential performance gains to be had, I think imposing >> > >> algebraic >> > >> > > > > > structure >> > >> > > > > > > provides a nice layer of abstraction over the particular >> > >> models >> > >> > > being >> > >> > > > > > > implemented. >> > >> > > > > > > >> > >> > > > > > > I'd love to hear everyone's feedback on this. Thanks for >> > your >> > >> > time >> > >> > > > and >> > >> > > > > > > enjoy your weekends! >> > >> > > > > > > >> > >> > > > > > > Alex Moreno >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> > >