The java 8 reduce() method returns a scalar. So my .map().reduce() example
didn't really have a shuffle phase. We haven't implemented any sort of
shuffle, but our reduce is processed on the servers first and then
aggregated on the client. I'm not quite sure what the best way to work a
shuffle into this stream API would be, actually. I suppose using a map
followed by a sort(). We didn't do anything clever with sort either :)

-Dan

On Thu, Aug 13, 2015 at 9:33 PM, Jags Ramnarayanan <[email protected]>
wrote:

> Agree. Could be very useful.
>
> Map reduce functionality has been a customer ASK for many years.
> Have you considered it extending to K-V pairs so the reduce shuffle is on
> the key. This is the essence of Map reduce, right? equivalent of user
> defined aggregation on groups of data.
>
> The distributed, parallel function for the 'map' would then hash on the key
> to route all map results with the same key to some node in the network.
> Essentially a parallel reduce. Maybe already done?
>
>
>
> On Thu, Aug 13, 2015 at 6:07 PM, Sudhir Menon <[email protected]> wrote:
>
> > This is pretty neat. Turn this into a feature request and I am sure
> > people will find this valuable.
> >
> > Suds
> > Sent from my iPhone
> >
> > > On Aug 13, 2015, at 5:24 PM, Dan Smith <[email protected]> wrote:
> > >
> > > Along the same lines as Anthony's email on distributed classloading,
> > QiHong
> > > and I also hacked up a prototype of shipping Java 8 stream operations
> to
> > > gemfire data stores.
> > >
> > > What we did:
> > >
> > > Java 8 allows people to do functional operations on a region using the
> > new
> > > stream API:
> > >
> > > region.entrySet().stream()
> > >        .filter(e -> e.getKey() % 2 == 0)
> > >        .map(e -> e.getValue())
> > >        .reduce(1, Integer::sum);
> > >
> > > It would be much more efficient if the filtering, mapping, reducing,
> etc.
> > > happen on the members that actual host the data, rather than shipping
> all
> > > the data to a single member.
> > >
> > > So we implemented a new method, remoteStream, which collects the
> > operations
> > > as they are applied to a stream. When a terminal operation like reduce
> is
> > > encountered, the whole pipeline is sent out to all members with a
> gemfire
> > > function, and the results of processing that pipeline on the data
> stores
> > > are brought back.
> > >
> > > What we did is basically a quick and dirty proof of concept. There's a
> > lot
> > > to polish up if we want to make this a real geode feature. Is this
> > > something people would be interested in using?
> > >
> > > The proof of concept code is sitting in github if anyone is interested.
> > > Checkout the stream-prototype and look at the tests in
> > StreamsP2PDUnitTest:
> > >
> > > https://github.com/upthewaterspout/incubator-geode
> >
>

Reply via email to