Re: [OpenJDK 2D-Dev] AAShapePipe concurrency & memory waste

Peter Levart Wed, 10 Apr 2013 14:17:11 -0700

Hi Laurent,

Could you disable tiered compilation for performance tests? Tieredcompilation is usually a source of jitter in the results. Pass-XX:-TieredCompilation to the VM.


Regards, Peter


On 04/10/2013 10:58 AM, Laurent Bourgès wrote:

Dear Jim,
2013/4/9 Jim Graham <[email protected]<mailto:[email protected]>>
    The allocations will always show up on a heap profiler, I don't
    know of any way of having them not show up if they are stack
    allocated, but I don't think that stack allocation is the issue
    here - small allocations come out of a fast generation that costs
    almost nothing to allocate from and nearly nothing to clean up.
     They are actually getting allocated and GC'd, but the process is
    optimized.

    The only way to tell is to benchmark and see which changes make a
    difference and which are in the noise (or, in some odd
    counter-intuitive cases, counter-productive)...

                            ...jim
I advocate I like GC because it avoids in Java dealing with pointerslike C/C++ does; however, I prefer GC clean real garbage(application...) than wasted memory:I prefer not count on GC when I can avoid wasting memory that gives GCmore work = reduce useless garbage (save the planet) !
Moreover, GC and / or Thread local allocation (TLAB) seems to havemore overhead than you think = "fast generation that costs almostnothing to allocate from and nearly nothing to clean up".
Here are my micro-benchmark results related to int[4] allocation whereI mimic the AAShapePipe.fillParallelogram() method:
Patch   Ref     Gain
5,96    8,27    138,76%
7,31    14,96   204,65%
10,65   20,4    191,55%
15,44   29,83   193,20%


Test environment:
Linux64 with OpenJDK8 (2 real cpu cores, 4 virtual cpus)
JVM settings:
-XX:+PrintCommandLineFlags -XX:-PrintFlagsFinal -Xms128m  -Xmx128m

Benchmark code (using Peter Levart microbench classes):
http://jmmc.fr/~bourgesl/share/AAShapePipe/microbench/<http://jmmc.fr/%7Ebourgesl/share/AAShapePipe/microbench/>
My conclusion is: "nothing" > zero (allocation + cleanup) and it isvery noticeable in multi threading tests.
I advocate that I use a dirty int[4] array (no cleanup) but it is notnecessary : maybe the performance gain come from that reason.
Finally here is the output with  -XX:+PrintTLAB flag:
TLAB: gc thread: 0x00007f105813d000 [id: 4053] desired_size: 1312KBslow allocs: 0 refill waste: 20992B alloc: 1,00000 65600KB refills:20 waste 1,2% gc: 323712B slow: 600B fast: 0BTLAB: gc thread: 0x00007f105813a800 [id: 4052] desired_size: 1312KBslow allocs: 0 refill waste: 20992B alloc: 1,00000 65600KB refills: 7waste 7,9% gc: 745568B slow: 176B fast: 0BTLAB: gc thread: 0x00007f1058138800 [id: 4051] desired_size: 1312KBslow allocs: 0 refill waste: 20992B alloc: 1,00000 65600KB refills:15 waste 3,1% gc: 618464B slow: 448B fast: 0BTLAB: gc thread: 0x00007f1058136800 [id: 4050] desired_size: 1312KBslow allocs: 0 refill waste: 20992B alloc: 1,00000 65600KB refills: 7waste 0,0% gc: 0B slow: 232B fast: 0BTLAB: gc thread: 0x00007f1058009000 [id: 4037] desired_size: 1312KBslow allocs: 0 refill waste: 20992B alloc: 1,00000 65600KB refills: 1waste 27,5% gc: 369088B slow: 0B fast: 0BTLAB totals: thrds: 5 refills: 50 max: 20 slow allocs: 0 max 0waste: 3,1% gc: 2056832B max: 745568B slow: 1456B max: 600B fast: 0Bmax: 0B
I would have expected that TLAB can recycle all useless int[4] arraysas fast as possible => waste = 100% ???
*Is there any bug in TLAB (core-libs) ?
Should I send such issue to hotspot team ?
*

*Test using ThreadLocal AAShapePipeContext:*
{
AAShapePipeContext ctx = getThreadContext();
int abox[] = ctx.abox;

    // use array:
// mimic: AATileGenerator aatg =renderengine.getAATileGenerator(x, y, dx1, dy1, dx2, dy2, 0, 0, clip,abox);
    abox[0] = 7;
    abox[1] = 11;
    abox[2] = 13;
    abox[3] = 17;
// mimic: renderTiles(sg, computeBBox(ux1, uy1, ux2, uy2), aatg,abox);
devNull1.yield(abox);

    if (!useThreadLocal) {
restoreContext(ctx);
    }
}
-XX:ClassMetaspaceSize=104857600 -XX:InitialHeapSize=134217728-XX:MaxHeapSize=134217728 -XX:+PrintCommandLineFlags-XX:-PrintFlagsFinal -XX:+UseCompressedKlassPointers-XX:+UseCompressedOops -XX:+UseParallelGC
>> JVM START: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b24]
#-------------------------------------------------------------
# ContextGetInt4: run duration: 10 000 ms
#
# Warm up:
# 4 threads, Tavg = 13,84 ns/op (σ = 0,23 ns/op),Total ops = 2889056179 [ 13,93 (717199825), 13,87(720665624), 13,48 (741390545), 14,09 (709800185)]# 4 threads, Tavg = 14,25 ns/op (σ = 0,57 ns/op),Total ops = 2811615084 [ 15,21 (658351236), 14,18(706254551), 13,94 (718202949), 13,74 (728806348)]
cleanup (explicit Full GC) ...
cleanup done.
# Measure:
*1 threads, Tavg = 5,96 ns/op (σ = 0,00 ns/op), Total ops =1678357614 [ 5,96 (1678357614)]2 threads, Tavg = 7,33 ns/op (σ = 0,03 ns/op), Total ops =2729723450 [ 7,31 (1369694121), 7,36 (1360029329)]3 threads, Tavg = 10,65 ns/op (σ = 2,73 ns/op), Total ops =2817154340 [ 13,24 (755190111), 13,23 (755920429), 7,66(1306043800)]**4 threads, Tavg = 15,44 ns/op (σ = 3,33 ns/op), Total ops =2589897733 [ 17,05 (586353618), 19,23 (519345153), 17,88(559401974), 10,81 *(924796988)]
#
<< JVM END

*Test using standard int[4] allocation:*
{
int abox[] = new int[4];

// use array:
// mimic: AATileGenerator aatg =renderengine.getAATileGenerator(x, y, dx1, dy1, dx2, dy2, 0, 0, clip,abox);
    abox[0] = 7;
    abox[1] = 11;
    abox[2] = 13;
    abox[3] = 17;
// mimic: renderTiles(sg, computeBBox(ux1, uy1, ux2, uy2), aatg,abox);
devNull1.yield(abox);
}
-XX:ClassMetaspaceSize=104857600 -XX:InitialHeapSize=134217728-XX:MaxHeapSize=134217728 -XX:+PrintCommandLineFlags-XX:-PrintFlagsFinal -XX:+UseCompressedKlassPointers-XX:+UseCompressedOops -XX:+UseParallelGC
>> JVM START: 1.8.0-internal [OpenJDK 64-Bit Server VM 25.0-b24]
#-------------------------------------------------------------
# GetInt4: run duration: 10 000 ms
#
# Warm up:
# 4 threads, Tavg = 31,07 ns/op (σ = 0,60 ns/op),Total ops = 1287292142 [ 30,26 (330475567), 31,92(313328449), 31,27 (319805520), 30,89 (323682606)]# 4 threads, Tavg = 30,94 ns/op (σ = 0,33 ns/op),Total ops = 1293000783 [ 30,92 (323382193), 30,61(326730340), 31,48 (317621402), 30,74 (325266848)]
cleanup (explicit Full GC) ...
cleanup done.
# Measure:
*1 threads, Tavg = 8,27 ns/op (σ = 0,00 ns/op), Total ops =1209213909 [ 8,27 (1209213909)]2 threads, Tavg = 14,96 ns/op (σ = 0,04 ns/op), Total ops =1337024734 [ 15,00 (666659967), 14,92 (670364767)]3 threads, Tavg = 20,40 ns/op (σ = 1,03 ns/op), Total ops =1470560922 [ 21,21 (471592958), 19,00 (526302911), 21,16(472665053)]**4 threads, Tavg = 29,83 ns/op (σ = 1,82 ns/op), Total ops =1340065128 [ 31,17 (320806983), 31,58 (316358130), 26,94(370806790), 30,11 *(332093225)]
#
<< JVM END

Best regards,
Laurent

Re: [OpenJDK 2D-Dev] AAShapePipe concurrency & memory waste

Reply via email to