Hi Clemens, I read again the source code and wanted to add metrics like wait_ratio and make some new heuristics to auto tune the buffer capacity.
I was also tempted to rewrite the buffer overflow strategy: swap buffer or use a larger one... Apparently you made another approach (double buffer) to reduce the wait latency in awt threads... and allow gl thread to run concurently: excellent ! Le dim. 17 janv. 2021 à 13:11, Clemens Eisserer <linuxhi...@gmail.com> a écrit : > Hi Sergey, > > > The design is a little bit different, the XRender pipeline: > > - Many threads which call draw operations and sends them to the XServer > > - The XSerever decode the the draw operation and push the commands to > the "opengl buffer" > > - Some video driver code which decode the ogl buffer and actually draw > something > > The OGL pipeline > > - Many threads call draw operations, and save them in the internal > RenderQueue buffer > > - One RenderQueue thread decode the the draw operation and push the > commands to the "opengl buffer" > > - Some video driver code which decode the ogl buffer and actually draw > something > > So in both cases producers (application threads) and the consumer are > decoupled via some kind of queue (unix domain socket for x11, in-memory > queue for the renderqueue) and in theory could operate concurrently. > > > I am not sure that it work that way, the only way to block the queuing > thread is to call "flushNow()", for other cases it should not be blocked. > > The OGLRenderQueue thread however could became blocked every 100ms. > > Since there is only one RenderQueue+Buffer, the entire queue has to be > locked while the commands are processed by the RenderQueue-Thread (via the > AWT lock held by the thread which is implicitly calling flushNow triggered > by a full buffer). > This serverly limits parallelism - you can either process commands > (RenderQueue thread) or queue new commands (AWT threads) but not both at > the same time. > This lead me to the original question - whether this was a necessity > caused by structural requirements/limitations or whether it simply hasn't > been thought through / implemented. > > I did some prototyping of a double-buffered renderqueue (no other changes) > over the weekend and results are really promising (1x1 fillRect to measure > pipeline-overhead, 20x20 aa oval to get an impression data-heavy buffer > interaction): > > Test(graphics.render.tests.fillRect) averaged 1.4837645794468326E7 > pixels/sec > with !antialias, to VolatileImg(), ident, SrcOver, single, width1, 1x1, > !clip, bounce, Default, !xormode, !alphacolor, !extraalpha > Test(graphics.render.tests.fillOval) averaged 3.896264839428713E7 > pixels/sec > with !alphacolor, SrcOver, 20x20, Default, antialias, bounce, !xormode, > to VolatileImg(), !clip, width1, !extraalpha, single, ident > > whereas the original JDK with OGL yielded: > > Test(graphics.render.tests.fillRect) averaged 5061909.644344761 pixels/sec > with single, 1x1, SrcOver, !extraalpha, bounce, Default, !xormode, to > VolatileImg(), ident, !alphacolor, width1, !antialias, !clip > Test(graphics.render.tests.fillOval) averaged 1.0837940280832347E7 > pixels/sec > with single, 20x20, SrcOver, !extraalpha, bounce, Default, !xormode, to > VolatileImg(), ident, !alphacolor, width1, antialias, !clip > > and with XRender: > Test(graphics.render.tests.fillRect) averaged 2.5252814688096754E7 > pixels/sec > with ident, to VolatileImg(), 1x1, !clip, !extraalpha, width1, > !alphacolor, Default, single, bounce, !antialias, !xormode, SrcOver > All test results: > Test(graphics.render.tests.fillOval) averaged 2.53725229244114E7 pixels/sec > with ident, to VolatileImg(), 20x20, !clip, !extraalpha, width1, > !alphacolor, Default, single, bounce, antialias, !xormode, SrcOver > Could you share your patch to let me study it and try improving it ? Cheers, Laurent