Re: [OpenJDK Rasterizer] RFR: Marlin renderer #2

Laurent Bourgès Fri, 19 Jun 2015 05:10:15 -0700

Jim,

here are the benchmark results:
- REF: Marlin reference = initial capacity tuned for arrays and
OffHeapEdgeArray
- NO_INITIAL: initial arrays = [0]
- NO_INITIALS_OFFHEAP_16: initial arrays = [0] and OffHeapEdgeArray(16)


I pushed all details (stats & benchmarks):
http://cr.openjdk.java.net/~lbourges/marlin/bench_initial_arrays/


1/ Benchmark results:

The OffHeapEdgeArray size is more critical: 5% slower than previous test
(initial arrays = [0])

  *Renderer* *Test count* 30 10 10 10
*Threads* *4* *1* *2* *4*  *REF* *Pct95* 237.848 233.887 238.43 241.226
*NO_INITIALS* *Pct95* 244.261 241.116 244.028 247.639
*NO_INITIALSOFF_HEAP_16* *Pct95* 257.091 253.211 256.13 261.93
For the complex map, it is more pronounced: ~20% slower than the reference
test:

*REF:*
      dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 770.511 775.448 770.448
4.668 765.125 787.473 100
100.00%  *NO_INITIALS_OFF_HEAP_16:*
     dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 902.238 934.679 910.759
14.478 898.332 956.92 100
120.53%        *NO_INITIALS:*
dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 815.775 823.593 817.352 6.752
813.031 872.658 100
106.21%

2/ Statistics: cache accesses (and array sizes per bucket) are very huge.

For example:
- stats_NO_INITIALS.log:
Loading DrawingCommands: ../maps/dc_shp_alllayers_2013-00-30-07-00-47.ser
Loaded DrawingCommands: DrawingCommands{width=1400, height=800, commands=
*135213*}
...
INFO: ArrayCache: int resize: 0 - dirty int resize: 140612 - dirty float
resize: 104025 - dirty byte resize: 103966 - oversize: 0
...
INFO: Array caches for thread: ctx1
INFO: IntArrayCache[4096]: get: 281224 created: 2 - returned: 281224 ::
cache size: 2
INFO: Dirty Array caches for thread: ctx1
INFO: IntArrayCache[4096]: get: 562448 created: 4 - returned: 562448 ::
cache size: 4
INFO: FloatArrayCache[4096]: get: 104025 created: 2 - returned: 104025 ::
cache size: 2
INFO: ByteArrayCache[65536]: get: 103966 created: 1 - returned: 103966 ::
cache size: 1

- stats_NO_INITIALS_OFFHEAP_16.log:
INFO: renderer.edges.resize[*483598*] sum: 86874016 avg: 179.64 [32 | 4096]

The OffHeapEdgeArray is resized a lot for this map: 4096 is the good
capacity for this test case.

Several test cases need a lot more memory: 32K, 64K or 128K.
*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[15915] sum:
16182208 avg: 1016.789 [32 | 131072]*

*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[7807] sum:
6053440 avg: 775.386 [32 | 65536]stats_NO_INITIAL_OFFHEAP_16.log:INFO:
renderer.edges.resize[2231] sum: 4420224 avg: 1981.274 [32 | 131072]*
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[483598] sum:
86874016 avg: 179.64 [32 | 4096]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[4696] sum:
1284224 avg: 273.471 [32 | 8192]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[1655] sum:
520224 avg: 314.334 [32 | 8192]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[794] sum:
1068960 avg: 1346.297 [32 | 16384]



*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[852] sum:
938048 avg: 1100.995 [32 | 32768]stats_NO_INITIAL_OFFHEAP_16.log:INFO:
renderer.edges.resize[22] sum: 134217696 avg: 6100804.363 [32 |
67108864]stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[62062]
sum: 9914976 avg: 159.759 [32 | 65536]*
The spiral test needs up to 67 108 864 bytes !


To conclude, I already tuned initial capacities according to my benchmarks
without consuming too much memory ~ 512K. However, I agree these capacities
can be adjusted again depending on the workload or if you have any
preference.


3/ Heap size:

I have run again the test NO_INITIALS with only 512m heap:

==> marlin_NO_INITIALS_Xmx512m.log <==
Threads    4    1    2    4
Pct95    250.374    240.754    250.038    260.331

==> marlin_NO_INITIALS.log <==
Threads    4    1    2    4
Pct95    244.261    241.116    244.028    247.639

So the weak cache has a bigger impact the smaller is the heap !
Actually, adding more threads implies more renderer contexts with their
caches that creates more garbage (weak).

Typically the weak cache impacts small memory applications or web servers =
many concurrent map requests !

To conclude, the less garbage Marlin produces, the best performance it is.

To be fair, I should also run again the reference test with 512m; but let's
stop here for now.


I hope these new results will give you an overview of the memory / array
cache issue that Marlin has to deal with.

Laurent

Re: [OpenJDK Rasterizer] RFR: Marlin renderer #2

Reply via email to