Jim, Jim, here are my comments: 2016-02-04 2:55 GMT+01:00 Jim Graham <james.gra...@oracle.com>:
> Interesting, so this can happen with any custom Paint or Composite. Yes, > re-entrance detection should be done. > FYI, I proposed a bug fix and also checked the output images are now correct: http://mail.openjdk.java.net/pipermail/2d-dev/2016-February/006272.html Please also check twice AAShapePipe as it may also have problem with reentrancy as the TileState is in thread-local storage too: Should I also use another TileState ? I fixed an issue with int[] abox but I eventually see another issue with the int[] alpha if outpipe.renderPathTile() make reentrant calls too ! > Just out of curiosity - the 10% performance hit for the CLQ storage vs. > thread-local - that was specifically only for the time it takes to retreive > the Renderer, right? I can't imagine why the entire rendering operation > would take 10% longer for any mechanism that retrieves a cached Renderer - > unless perhaps it was missing the cache and allocating a new one every > time? I would think a simple synchronized block around a small cache of > Renderers would be nearly imperceptible compared to the amount of time > spent inside the rendering process... > Let me try to explain: In my MapBench tool, I have a very complex map [135 000 shapes]: dc_shp_alllayers_2013-00-30-07-00-47.ser - Marlin [TL]: Test Threads Ops Med Pct95 Avg StdDev Min Max FPS(med) TotalOps [ms/op] dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 778.610 779.885 778.876 0.622 777.904 780.050 1.284 25 dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 780.596 781.677 780.532 0.733 779.091 782.623 1.281 50 dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 781.663 788.781 783.098 4.548 779.113 805.189 1.279 100 - Marlin [CLQ]: Test Threads Ops Med Pct95 Avg StdDev Min Max FPS(med) TotalOps [ms/op] dc_shp_alllayers_2013-00-30-07-00-47.ser 1 25 787.245 789.810 787.370 1.556 784.139 790.589 1.270 25 dc_shp_alllayers_2013-00-30-07-00-47.ser 2 50 846.023 877.742 853.157 19.231 818.292 885.293 1.182 50 dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 826.174 851.683 831.681 12.548 814.943 871.630 1.210 100 As you can see, the 8 to 10% difference corresponds to the ratio between total times taken to render the map [877ms vs 780ms] (on the 95th percentile) that happens with 2 or 4 concurrent threads. I guess: 1. threads are rendering many shapes in parallel (135k) so there is a high pressure on the underlying ConcurrentLinkedQueue to add/remove the RendererContext instances. I tried also with a simple synchronized(ArrayDeque) that is slightly slower than CLQ (high concurrency) as all threads are busy. 2. More garbage is produced as every CLQ.offer() will create a CLQ.Node as seen with jmap -histo <pid>: num #instances #bytes class name ---------------------------------------------- 1: 3370992 107871744 java.awt.geom.Path2D$Float$CopyIterator * 2: 3370996 80903904 java.util.concurrent.ConcurrentLinkedQueue$Node* 3: 135226 9736272 java.awt.geom.AffineTransform 4: 858 8430464 [I 5: 135309 7770696 [F 6: 135213 6490224 it.geosolutions.java2d.DrawingCommand 7: 135213 4326816 java.awt.geom.GeneralPath 8: 99122 3964880 java.awt.BasicStroke 9: 99118 3964720 it.geosolutions.java2d.SerializableBasicStroke 10: 135391 3500728 [B 11: 135213 3245112 it.geosolutions.java2d.SerializableAlphaComposite PS: the most important garbage corresponds to the Path2D.Float.CopyIterator instances : that could also be improved ... Regards, Laurent