Re: Demo for Parallel Core Collection API

Paul Sandoz Tue, 15 Oct 2013 08:22:51 -0700

On Oct 15, 2013, at 4:35 PM, Tristan Yan <[email protected]> wrote:

> Hi Paul
> you have comments "suggest that all streams are sequential. There is an 
> inconsistency in the use and in some cases it is embedded in other stream 
> usages."
> 
> We do not really understand what exactly is meant, could you elaborate a 
> little bit. Is it because we want to show ppl that we should use stream more 
> than parallelStream?

Going parallel is easy to do but not always the right thing to do. Going 
parallel almost always requires more work with the expectation that work will 
complete sooner than the work required to get the same result sequentially. 
There are a number of factors that affect whether parallel is faster than 
sequential. Two of those factors are N, the size of the data, and Q the cost of 
processing an element in the pipeline. N * Q is a simple cost model, the large 
that product the better the chances of parallel speed up. N is easy to know, Q 
not so easy but can often be intuitively guessed. (Note that there are other 
factors such as the properties of the stream source and operations that Brian 
and I talked about in our J1 presentation.)

Demo code that just makes everything (or most streams) parallel is sending out 
the wrong message. 

So i think the demo code should present two general things:

1) various stream functionality, as you have done;

2) parallel vs. sequential for various cases where it is known that parallel is 
faster on a multi-core system.

For 2) i strongly recommend measuring using jmh [1]. The data sets you have may 
or may not be amenable to parallel processing, it's worth investigating though.

I have ideas for other parallel demos. One is creating probably primes (now 
that SecureRandom is replaced with ThreadLocalRandom), creating a probably 
prime that is a BigInteger is an relatively expensive operation so Q should be 
high. Another more advanced demo is a Monte-Carlo calculation of PI using 
SplittableRandom and a special Spliterator, in this case N should be largish. 
But there are other simpler demonstrations like sum of squares etc to get 
across that N should be large. Another demo could be calculation of a 
mandelbrot set, which is embarrassingly parallel over an area in the complex 
plane.

So while you should try and fit some parallel vs. sequential execution into 
your existing demos i do think it worth having a separate set of demos that get 
across the the simple cost model of N * Q. So feel free to use some of those 
ideas previously mentioned, i find those ideas fun so perhaps others will too 
:-)

Paul.

[1] http://openjdk.java.net/projects/code-tools/jmh/

On Oct 15, 2013, at 4:37 PM, Tristan Yan <[email protected]> wrote:

> Also there is one more question I missed
> 
> You suggested ""ParallelCore" is not a very descriptive name. Suggest 
> "streams"."
> 1) yes we agree this demo is not for parallel computation per se
> 2) but we do not have a clear demo for parallel computation
> 3) if we are to rename this, we need to develop another one, do you have a 
> scenario for that?

Re: Demo for Parallel Core Collection API

Reply via email to