Another place to go with this is to apply an OQL query to generate the stream.

region.entrySet().remoteStream(“select * from /myregion.entries e where e.key > 
10")
       .filter(e -> e.getKey() % 2 == 0)
       .map(e -> e.getValue())
       .reduce(1, Integer::sum);

Anthony



> On Aug 16, 2015, at 10:56 PM, Jags Ramnarayanan <[email protected]> 
> wrote:
> 
> right. Use Spark's API as input.
> 
> Dan, if you are anyway extending 'streams' with 'remoteStreams' you should
> be able to extend the API for K-V. I haven't gone through Java8 streams,
> but, one small step for you could be one giant leap into "Big data" for Gem
> :-)
> 
> All your tool has to be capable of is implement the "hello world" for big
> data - count words in sentences. :-)
> Your output needs to be k-v collection where the key is the word and v is
> the count. The fastest, scalable guy wins. And, you know what I am getting
> at - we are very used to parallel behavior localized to data but assume a
> central aggregator. Here you want the aggregator to be parallelized too.
> Most common solutions use disk for shuffle. Gem's function service can
> pipeline with its chunking support.
> 
> After you implement map-reduce read this perspective from Stonebraker -
> https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
> Just kidding.
> 
> 
> 
> On Sun, Aug 16, 2015 at 4:12 PM, Roman Shaposhnik <[email protected]>
> wrote:
> 
>> On Fri, Aug 14, 2015 at 1:51 PM, Dan Smith <[email protected]> wrote:
>>> The java 8 reduce() method returns a scalar. So my .map().reduce()
>> example
>>> didn't really have a shuffle phase. We haven't implemented any sort of
>>> shuffle, but our reduce is processed on the servers first and then
>>> aggregated on the client. I'm not quite sure what the best way to work a
>>> shuffle into this stream API would be, actually. I suppose using a map
>>> followed by a sort(). We didn't do anything clever with sort either :)
>> 
>> Isn't what you're looking for analogous to  reduce() versus reduceByKey()
>> in Spark terminology
>> 
>> Thanks,
>> Roman.
>> 

Reply via email to