Hi Clemens, It reminds me our discussion few years ago, when I experimented opengl on linux and tuned the OGLRenderQueue for the Marlin renderer, see: - https://github.com/bourgesl/marlin-renderer/blob/jdk/src/main/java/sun/java2d/pipe/RenderQueue.java - https://github.com/bourgesl/marlin-renderer/blob/jdk/src/main/java/sun/java2d/opengl/OGLRenderQueue.java
1. These changes mainly consists in increasing internal queue buffer from 32K to 1 Mb (based on my own performance tests). Ideally an heuristic (data volume per second) could help estimating the appropriate queue capacity needed to automatically optimize the producer writing / consumer reading speed. I like your idea to have multiple queue producer but a single consumer means a single Surface destination at 1 time ? How to do parallel rendering on different surfaces (ogl context switch) ? 2. As the Marlin renderer supports large tiles (up to 256x256), it is very easy to adjust Marlin tiles at runtime for performance: -Dsun.java2d.renderer.log=true -Dsun.java2d.renderer.tileSize_log2=5 -Dsun.java2d.renderer.tileWidth_log2=5 By default, 5 means 2^5 = 32. Just set larger values: -Dsun.java2d.renderer.tileSize_log2=6 -Dsun.java2d.renderer.tileWidth_log2=6 Finally your proposal is amazing and I could join you in your efforts, if we publicly work on the same repository (github.com ?) Cheers, Happy New Year, Laurent Le ven. 15 janv. 2021 à 09:35, Clemens Eisserer <linuxhi...@gmail.com> a écrit : > Hi, > > With solid OpenGL support on Linux being ubiquitous these days and the > XRender pipeline being a bit of a dead-end (works quite well except > MaskBlit/MaskFill/BufferedImageOps), I was looking a bit into the > state/performance of the OpenGL pipeline. > Specifically why it performs sometimes worse compared to XRender, despite > almost all XRender implementations are running on top of OpenGL these days > anyway (except proprietary nvidia). > > 1. One area where XRender is having an advantage is implicit parallelism. > While Java is producing X11 protocol, the XServer can concurrently perform > the drawing operations running on a different core. > Therefore when running some Swing benchmarks with xrender enabled I see > java consuming 100% of one core, while the XServer consumes ~50% of another > one. > With the OpenGL pipeline on the other hand, just one core is fully loaded > - despite a similar design (one flusher thread calling into OpenGL, and one > or more independent threads queuing drawing operations into a buffer). > > The reason seems to be the OGLRenderQueue has just one buffer, so either > the flusher thread is active or a queuing thread but not both. > I wonder, have there been attempts made to introduce double-buffering here > and have producers (awt threads) and consumer (queue flusher thread) > running concurrently here? > > 2. Especially MaskFill did perform quite poor in my tests, because drivers > are typically not optimized for tiny texture uploads (32x32 coverage masks). > Just stubbing out the subTex2D call improved framerate of one benchmark > from 100fps to 300fps. > I have done some prototyping uploading coverage masks via > Shader_Storage_Buffer_Object, but that requires > ARB_shader_storage_buffer_object (GL 4.3) as well glBufferStorage (4.4), so > effectivly OpenGL-4.4. > On the pro side of this approach composition peaked at about 10GB/s with > 64x64 mask tiles. > > Which leads me to the next question: > The pipeline currently is written in rather ancient OpenGL-2.x style. > Once the need to support old OSX versions is gone, the OpenGL pipeline > would not be enabled be default on any platform. > Once this is the case, would it be feasable to clean up the code by making > use of modern GL and by leaving legacy platforms behind? > Modern GL has many improvements in areas where the current pipeline is > suffering a bit (overhead of state changes for small drawing operations, > etc). > > Thanks and best regards, Clemens > -- -- Laurent Bourgès