yes i usually follow #2 too.

The thing is, pretty often algorithm can define its own set of strategies
the backend need to support (like this distributedEngine strategy) and keep
a lot of logic still common accross all strategies. But then if all-reduce
aggregate operation is incredibly common among such algorithm speicfic
strategies, then it stands to reason to implement it only once.

I have an idea.

Maybe we need a common distributed strategy which is different from
algebraic optimizer. That way we don't have to mess with algebraic
rewrites. how about that?

On Wed, Nov 12, 2014 at 9:12 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> So you are following #2, which is good. #1 seems a bit like a hack. For a
> long time to come we will have to add things to the DSL if it is to be kept
> engine independent. Yours looks pretty general and simple.
>
> Are you familiar with the existing Mahout aggregate methods? They show up
> in the SGDHelper.java and other places in legacy code. I don’t know much
> about them but they seem to be a pre-functional programming attempt at this
> kind of thing. It looks like you are proposing a replacement for those
> based on rdd.aggregate, if so that would be very useful. For one thing it
> looks like the old aggregate was not parallel, rdd.aggregate is.
>
>
> On Nov 11, 2014, at 1:18 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
>
> So the alternatives are:
>
> 1- mapBlock to a matrix whose all rows-but-the first are empty, then
> aggregate
> 2- depend on a backend
>
> 1 is obviously OK.
>
> I don't like the idea of depending on a backend since SGD is a generic loss
> minimization, on which other algorithms will possibly depend.
>
> In this context, client-side aggregation is not an overhead, but even if it
> happens to be so, it doesn't have to be a client-side aggregate at all.
>
> Alternative to 1, I am thinking of at least having an aggregation
> operation, which will return an accumulated value anyway, and shouldn't
> affect algebra optimizations.
>
> I quickly implemented a naive one (supporting only Spark- I know I said
> that I don't like depending on a backend, but at least the backends-wide
> interface is consistent, and as a client, I still don't have to deal with
> Spark primitives directly).
>
> Is this nice enough? Is it too bad to have in the DSL?
> https://github.com/gcapan/mahout/compare/accumulateblocks
>
> Best
>
> Gokhan
>
> On Tue, Nov 11, 2014 at 10:45 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
>
> > Oh. algorithm actually collects the vectors and runs another cycle in the
> > client!
> >
> > Still, technically, you can collect almost-empty blocks to the client
> > (since they are mostly empty, it won't cause THAT huge overhead compared
> to
> > collecting single vectors, after all, how many partitions are we talking
> > about? 1000? ).
> >
> > On Tue, Nov 11, 2014 at 12:41 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Sat, Nov 8, 2014 at 12:42 PM, Gokhan Capan <gkhn...@gmail.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Based on Zinkevich et al.'s Parallelized Stochastic Gradient paper (
> >>> http://martin.zinkevich.org/publications/nips2010.pdf), I tried to
> >>> implement SGD, and a regularized least squares solution for linear
> >>> regression (can easily be extended to other GLMs, too).
> >>>
> >>> How the algorithm works is as follows:
> >>> 1. Split data into partitions of T examples
> >>> 2. in parallel, for each partition:
> >>>   2.0. shuffle partition
> >>>   2.1. initialize parameter vector
> >>>   2.2. for each example in the shuffled partition
> >>>       2.2.1 update the parameter vector
> >>> 3. Aggregate all the parameter vectors and return
> >>>
> >>
> >> I guess technically it is possible (transform each block to a
> >> SparseRowMatrix or SparseMatrix with only first valid row) and then
> invoke
> >> colSums() or colMeans() (whatever aggregate means).
> >>
> >> However, i am not sure it is worth the ugliness. isn't it easier to
> >> declare these things quasi-algebraic and just do direct spark calls on
> the
> >> matrix RDD (map, aggregate)?
> >>
> >> The real danger is to introduce non-algebra things into algebra so that
> >> the rest of the algebra doesn't optimize any more.
> >>
> >>
> >
>
>

Reply via email to