Hi, With solid OpenGL support on Linux being ubiquitous these days and the XRender pipeline being a bit of a dead-end (works quite well except MaskBlit/MaskFill/BufferedImageOps), I was looking a bit into the state/performance of the OpenGL pipeline. Specifically why it performs sometimes worse compared to XRender, despite almost all XRender implementations are running on top of OpenGL these days anyway (except proprietary nvidia).
1. One area where XRender is having an advantage is implicit parallelism. While Java is producing X11 protocol, the XServer can concurrently perform the drawing operations running on a different core. Therefore when running some Swing benchmarks with xrender enabled I see java consuming 100% of one core, while the XServer consumes ~50% of another one. With the OpenGL pipeline on the other hand, just one core is fully loaded - despite a similar design (one flusher thread calling into OpenGL, and one or more independent threads queuing drawing operations into a buffer). The reason seems to be the OGLRenderQueue has just one buffer, so either the flusher thread is active or a queuing thread but not both. I wonder, have there been attempts made to introduce double-buffering here and have producers (awt threads) and consumer (queue flusher thread) running concurrently here? 2. Especially MaskFill did perform quite poor in my tests, because drivers are typically not optimized for tiny texture uploads (32x32 coverage masks). Just stubbing out the subTex2D call improved framerate of one benchmark from 100fps to 300fps. I have done some prototyping uploading coverage masks via Shader_Storage_Buffer_Object, but that requires ARB_shader_storage_buffer_object (GL 4.3) as well glBufferStorage (4.4), so effectivly OpenGL-4.4. On the pro side of this approach composition peaked at about 10GB/s with 64x64 mask tiles. Which leads me to the next question: The pipeline currently is written in rather ancient OpenGL-2.x style. Once the need to support old OSX versions is gone, the OpenGL pipeline would not be enabled be default on any platform. Once this is the case, would it be feasable to clean up the code by making use of modern GL and by leaving legacy platforms behind? Modern GL has many improvements in areas where the current pipeline is suffering a bit (overhead of state changes for small drawing operations, etc). Thanks and best regards, Clemens