right. Use Spark's API as input.

Dan, if you are anyway extending 'streams' with 'remoteStreams' you should
be able to extend the API for K-V. I haven't gone through Java8 streams,
but, one small step for you could be one giant leap into "Big data" for Gem
:-)

All your tool has to be capable of is implement the "hello world" for big
data - count words in sentences. :-)
Your output needs to be k-v collection where the key is the word and v is
the count. The fastest, scalable guy wins. And, you know what I am getting
at - we are very used to parallel behavior localized to data but assume a
central aggregator. Here you want the aggregator to be parallelized too.
Most common solutions use disk for shuffle. Gem's function service can
pipeline with its chunking support.

After you implement map-reduce read this perspective from Stonebraker -
https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
Just kidding.



On Sun, Aug 16, 2015 at 4:12 PM, Roman Shaposhnik <[email protected]>
wrote:

> On Fri, Aug 14, 2015 at 1:51 PM, Dan Smith <[email protected]> wrote:
> > The java 8 reduce() method returns a scalar. So my .map().reduce()
> example
> > didn't really have a shuffle phase. We haven't implemented any sort of
> > shuffle, but our reduce is processed on the servers first and then
> > aggregated on the client. I'm not quite sure what the best way to work a
> > shuffle into this stream API would be, actually. I suppose using a map
> > followed by a sort(). We didn't do anything clever with sort either :)
>
> Isn't what you're looking for analogous to  reduce() versus reduceByKey()
> in Spark terminology
>
> Thanks,
> Roman.
>

Reply via email to