Just following up on this - I created GEODE-262 to track this feature request.
Thanks! -Dan On Tue, Aug 18, 2015 at 11:43 AM, Anthony Baker <[email protected]> wrote: > Another place to go with this is to apply an OQL query to generate the > stream. > > region.entrySet().remoteStream(“select * from /myregion.entries e where > e.key > 10") > .filter(e -> e.getKey() % 2 == 0) > .map(e -> e.getValue()) > .reduce(1, Integer::sum); > > Anthony > > > > > On Aug 16, 2015, at 10:56 PM, Jags Ramnarayanan <[email protected]> > wrote: > > > > right. Use Spark's API as input. > > > > Dan, if you are anyway extending 'streams' with 'remoteStreams' you > should > > be able to extend the API for K-V. I haven't gone through Java8 streams, > > but, one small step for you could be one giant leap into "Big data" for > Gem > > :-) > > > > All your tool has to be capable of is implement the "hello world" for big > > data - count words in sentences. :-) > > Your output needs to be k-v collection where the key is the word and v is > > the count. The fastest, scalable guy wins. And, you know what I am > getting > > at - we are very used to parallel behavior localized to data but assume a > > central aggregator. Here you want the aggregator to be parallelized too. > > Most common solutions use disk for shuffle. Gem's function service can > > pipeline with its chunking support. > > > > After you implement map-reduce read this perspective from Stonebraker - > > > https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html > > Just kidding. > > > > > > > > On Sun, Aug 16, 2015 at 4:12 PM, Roman Shaposhnik <[email protected]> > > wrote: > > > >> On Fri, Aug 14, 2015 at 1:51 PM, Dan Smith <[email protected]> wrote: > >>> The java 8 reduce() method returns a scalar. So my .map().reduce() > >> example > >>> didn't really have a shuffle phase. We haven't implemented any sort of > >>> shuffle, but our reduce is processed on the servers first and then > >>> aggregated on the client. I'm not quite sure what the best way to work > a > >>> shuffle into this stream API would be, actually. I suppose using a map > >>> followed by a sort(). We didn't do anything clever with sort either :) > >> > >> Isn't what you're looking for analogous to reduce() versus > reduceByKey() > >> in Spark terminology > >> > >> Thanks, > >> Roman. > >> > >
