Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger

kabutz Wed, 17 Nov 2021 11:52:28 -0800

On Wed, 17 Nov 2021 17:51:03 GMT, Erik Österlund <eosterl...@openjdk.org> wrote:


> I would be wary to make any API use multiple threads behind the scenes 
> without the user explicitly asking for it. While latency of the given 
> operation might improve in isolation, parallelization always incur some 
> (often significant) additional cost of computation. This might reduce power 
> efficiency, reduce CPU availability for other tasks, and play tricks with 
> scalability.
> 
> Cost: On my system `multiply` burns about half as many cycles as 
> `parallelMultiply`, even though it takes 2.4x longer (measured with `-prof 
> perfnorm`).
> 
> Scalability: To simulate how well the solution scales you could try running 
> the `multiply` and `parallelMultiply` micros with increasingly large `-t` 
> values. (`-t` controls the number of threads running the benchmarks in 
> parallel, default 1). On my system the advantage of `parallelMultiply` 
> diminishes markedly as I saturate the system. On a test where I see a 2.4x 
> speed-up at `-t 1` (`n = 50000000`), the advantage drops to 1.5x at `-t 4` 
> and only gives me a 1.1x speed-up at `-t 8`.

Furthermore, since it uses the common FJP by default, any simultaneous parallel 
execution (parallel sort, parallel streams, etc.) would decrease the 
performance. 

> I'd favor a public `parallelMultiply()`. Alternatively a flag to opt-in to 
> parallelization.

The reason I would prefer a public method to a flag is that it might be useful 
to enable parallel multiply on a case-by-case basis. With a flag, it is an 
all-or-nothing approach. If it has to be a flag, then I'd agree that we should 
have to opt it in.

> (Nit: avoid appending flags to microbenchmarks that aren't strictly necessary 
> for the tests, or such that can be reasonably expected to be within bounds on 
> any test system. I didn't have 16Gb of free RAM.)

Thanks, will change that. Fun fact - the book Optimizing Java has a graph in 
the introduction that refers to some tests I did on Fibonacci a long time ago. 
The GC was dominating because the spaces were too small. However, in that case 
I was calculating Fibonacci of 1 billion. For "just" 100m, we don't need as 
much memory.

Here is the amount of memory that each of the Fibonacci calculations allocate. 
For the 100m calculation, the resident set size for multiply() is about 125mb 
and for parallelMultiply() about 190mb.


BigIntegerParallelMultiply.multiply:              1_000_000    177 MB
BigIntegerParallelMultiply.multiply:             10_000_000   3411 MB
BigIntegerParallelMultiply.multiply:            100_000_000  87689 MB
BigIntegerParallelMultiply.parallelMultiply:      1_000_000    177 MB
BigIntegerParallelMultiply.parallelMultiply:     10_000_000   3411 MB
BigIntegerParallelMultiply.parallelMultiply:    100_000_000  87691 MB

-------------

PR: https://git.openjdk.java.net/jdk/pull/6409

Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger

Reply via email to