Hi Sergey, > The design is a little bit different, the XRender pipeline: > - Many threads which call draw operations and sends them to the XServer > - The XSerever decode the the draw operation and push the commands to the "opengl buffer" > - Some video driver code which decode the ogl buffer and actually draw something > The OGL pipeline > - Many threads call draw operations, and save them in the internal RenderQueue buffer > - One RenderQueue thread decode the the draw operation and push the commands to the "opengl buffer" > - Some video driver code which decode the ogl buffer and actually draw something
So in both cases producers (application threads) and the consumer are decoupled via some kind of queue (unix domain socket for x11, in-memory queue for the renderqueue) and in theory could operate concurrently. > I am not sure that it work that way, the only way to block the queuing thread is to call "flushNow()", for other cases it should not be blocked. > The OGLRenderQueue thread however could became blocked every 100ms. Since there is only one RenderQueue+Buffer, the entire queue has to be locked while the commands are processed by the RenderQueue-Thread (via the AWT lock held by the thread which is implicitly calling flushNow triggered by a full buffer). This serverly limits parallelism - you can either process commands (RenderQueue thread) or queue new commands (AWT threads) but not both at the same time. This lead me to the original question - whether this was a necessity caused by structural requirements/limitations or whether it simply hasn't been thought through / implemented. I did some prototyping of a double-buffered renderqueue (no other changes) over the weekend and results are really promising (1x1 fillRect to measure pipeline-overhead, 20x20 aa oval to get an impression data-heavy buffer interaction): Test(graphics.render.tests.fillRect) averaged 1.4837645794468326E7 pixels/sec with !antialias, to VolatileImg(), ident, SrcOver, single, width1, 1x1, !clip, bounce, Default, !xormode, !alphacolor, !extraalpha Test(graphics.render.tests.fillOval) averaged 3.896264839428713E7 pixels/sec with !alphacolor, SrcOver, 20x20, Default, antialias, bounce, !xormode, to VolatileImg(), !clip, width1, !extraalpha, single, ident whereas the original JDK with OGL yielded: Test(graphics.render.tests.fillRect) averaged 5061909.644344761 pixels/sec with single, 1x1, SrcOver, !extraalpha, bounce, Default, !xormode, to VolatileImg(), ident, !alphacolor, width1, !antialias, !clip Test(graphics.render.tests.fillOval) averaged 1.0837940280832347E7 pixels/sec with single, 20x20, SrcOver, !extraalpha, bounce, Default, !xormode, to VolatileImg(), ident, !alphacolor, width1, antialias, !clip and with XRender: Test(graphics.render.tests.fillRect) averaged 2.5252814688096754E7 pixels/sec with ident, to VolatileImg(), 1x1, !clip, !extraalpha, width1, !alphacolor, Default, single, bounce, !antialias, !xormode, SrcOver All test results: Test(graphics.render.tests.fillOval) averaged 2.53725229244114E7 pixels/sec with ident, to VolatileImg(), 20x20, !clip, !extraalpha, width1, !alphacolor, Default, single, bounce, antialias, !xormode, SrcOver To be honest I don't have an explanation why results are *that* good, but on the other hand - XRender is still clearly faster for 1x1 fillRect and not *that* much slower for the maskfills. > There are many ways to improve the OGL pipeline using new OGL feature, but be aware that currently the OGL > pipeline is the only one crossplatform pipeline which works virtually on most of devices(new and old). > A similar architecture issue exists in the d3d pipeline as well, so maybe the Vulkan will better, > it could replace XRender and D3D someday? > > I thought Vulkan superseded the OGL, no? It did and for sure it would be a good/better path forward. On the other hand it would require a new pipeline implementation and not just incremental changes, something hard to do in spare-time ;) Thanks and best regards, Clemens