Jim, here are the benchmark results: - REF: Marlin reference = initial capacity tuned for arrays and OffHeapEdgeArray - NO_INITIAL: initial arrays = [0] - NO_INITIALS_OFFHEAP_16: initial arrays = [0] and OffHeapEdgeArray(16)
I pushed all details (stats & benchmarks): http://cr.openjdk.java.net/~lbourges/marlin/bench_initial_arrays/ 1/ Benchmark results: The OffHeapEdgeArray size is more critical: 5% slower than previous test (initial arrays = [0]) *Renderer* *Test count* 30 10 10 10 *Threads* *4* *1* *2* *4* *REF* *Pct95* 237.848 233.887 238.43 241.226 *NO_INITIALS* *Pct95* 244.261 241.116 244.028 247.639 *NO_INITIALSOFF_HEAP_16* *Pct95* 257.091 253.211 256.13 261.93 For the complex map, it is more pronounced: ~20% slower than the reference test: *REF:* dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 770.511 775.448 770.448 4.668 765.125 787.473 100 100.00% *NO_INITIALS_OFF_HEAP_16:* dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 902.238 934.679 910.759 14.478 898.332 956.92 100 120.53% *NO_INITIALS:* dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 815.775 823.593 817.352 6.752 813.031 872.658 100 106.21% 2/ Statistics: cache accesses (and array sizes per bucket) are very huge. For example: - stats_NO_INITIALS.log: Loading DrawingCommands: ../maps/dc_shp_alllayers_2013-00-30-07-00-47.ser Loaded DrawingCommands: DrawingCommands{width=1400, height=800, commands= *135213*} ... INFO: ArrayCache: int resize: 0 - dirty int resize: 140612 - dirty float resize: 104025 - dirty byte resize: 103966 - oversize: 0 ... INFO: Array caches for thread: ctx1 INFO: IntArrayCache[4096]: get: 281224 created: 2 - returned: 281224 :: cache size: 2 INFO: Dirty Array caches for thread: ctx1 INFO: IntArrayCache[4096]: get: 562448 created: 4 - returned: 562448 :: cache size: 4 INFO: FloatArrayCache[4096]: get: 104025 created: 2 - returned: 104025 :: cache size: 2 INFO: ByteArrayCache[65536]: get: 103966 created: 1 - returned: 103966 :: cache size: 1 - stats_NO_INITIALS_OFFHEAP_16.log: INFO: renderer.edges.resize[*483598*] sum: 86874016 avg: 179.64 [32 | 4096] The OffHeapEdgeArray is resized a lot for this map: 4096 is the good capacity for this test case. Several test cases need a lot more memory: 32K, 64K or 128K. *stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[15915] sum: 16182208 avg: 1016.789 [32 | 131072]* *stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[7807] sum: 6053440 avg: 775.386 [32 | 65536]stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[2231] sum: 4420224 avg: 1981.274 [32 | 131072]* stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[483598] sum: 86874016 avg: 179.64 [32 | 4096] stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[4696] sum: 1284224 avg: 273.471 [32 | 8192] stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[1655] sum: 520224 avg: 314.334 [32 | 8192] stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[794] sum: 1068960 avg: 1346.297 [32 | 16384] *stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[852] sum: 938048 avg: 1100.995 [32 | 32768]stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[22] sum: 134217696 avg: 6100804.363 [32 | 67108864]stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[62062] sum: 9914976 avg: 159.759 [32 | 65536]* The spiral test needs up to 67 108 864 bytes ! To conclude, I already tuned initial capacities according to my benchmarks without consuming too much memory ~ 512K. However, I agree these capacities can be adjusted again depending on the workload or if you have any preference. 3/ Heap size: I have run again the test NO_INITIALS with only 512m heap: ==> marlin_NO_INITIALS_Xmx512m.log <== Threads 4 1 2 4 Pct95 250.374 240.754 250.038 260.331 ==> marlin_NO_INITIALS.log <== Threads 4 1 2 4 Pct95 244.261 241.116 244.028 247.639 So the weak cache has a bigger impact the smaller is the heap ! Actually, adding more threads implies more renderer contexts with their caches that creates more garbage (weak). Typically the weak cache impacts small memory applications or web servers = many concurrent map requests ! To conclude, the less garbage Marlin produces, the best performance it is. To be fair, I should also run again the reference test with 512m; but let's stop here for now. I hope these new results will give you an overview of the memory / array cache issue that Marlin has to deal with. Laurent