Hi Sergey,

> The design is a little bit different, the XRender pipeline:
>   - Many threads which call draw operations and sends them to the XServer
>   - The XSerever decode the the draw operation and push the commands to
the "opengl buffer"
>   - Some video driver code which decode the ogl buffer and actually draw
something
> The OGL pipeline
>   - Many threads call draw operations, and save them in the internal
RenderQueue buffer
>   - One RenderQueue thread decode the the draw operation and push the
commands to the "opengl buffer"
>   - Some video driver code which decode the ogl buffer and actually draw
something

So in both cases producers (application threads) and the consumer are
decoupled via some kind of queue (unix domain socket for x11, in-memory
queue for the renderqueue) and in theory could operate concurrently.

> I am not sure that it work that way, the only way to block the queuing
thread is to call "flushNow()", for other cases it should not be blocked.
> The OGLRenderQueue thread however could became blocked every 100ms.

Since there is only one RenderQueue+Buffer, the entire queue has to be
locked while the commands are processed by the RenderQueue-Thread (via the
AWT lock held by the thread which is implicitly calling flushNow triggered
by a full buffer).
This serverly limits parallelism - you can either process commands
(RenderQueue thread) or queue new commands (AWT threads) but not both at
the same time.
This lead me to the original question - whether this was a necessity caused
by structural requirements/limitations or whether it simply hasn't been
thought through / implemented.

I did some prototyping of a double-buffered renderqueue (no other changes)
over the weekend and results are really promising (1x1 fillRect to measure
pipeline-overhead, 20x20 aa oval to get an impression data-heavy buffer
interaction):

Test(graphics.render.tests.fillRect) averaged 1.4837645794468326E7
pixels/sec
   with !antialias, to VolatileImg(), ident, SrcOver, single, width1, 1x1,
!clip, bounce, Default, !xormode, !alphacolor, !extraalpha
Test(graphics.render.tests.fillOval) averaged 3.896264839428713E7 pixels/sec
   with !alphacolor, SrcOver, 20x20, Default, antialias, bounce, !xormode,
to VolatileImg(), !clip, width1, !extraalpha, single, ident

whereas the original JDK with OGL yielded:

Test(graphics.render.tests.fillRect) averaged 5061909.644344761 pixels/sec
   with single, 1x1, SrcOver, !extraalpha, bounce, Default, !xormode, to
VolatileImg(), ident, !alphacolor, width1, !antialias, !clip
Test(graphics.render.tests.fillOval) averaged 1.0837940280832347E7
pixels/sec
   with single, 20x20, SrcOver, !extraalpha, bounce, Default, !xormode, to
VolatileImg(), ident, !alphacolor, width1, antialias, !clip

and with XRender:
Test(graphics.render.tests.fillRect) averaged 2.5252814688096754E7
pixels/sec
   with ident, to VolatileImg(), 1x1, !clip, !extraalpha, width1,
!alphacolor, Default, single, bounce, !antialias, !xormode, SrcOver
All test results:
Test(graphics.render.tests.fillOval) averaged 2.53725229244114E7 pixels/sec
   with ident, to VolatileImg(), 20x20, !clip, !extraalpha, width1,
!alphacolor, Default, single, bounce, antialias, !xormode, SrcOver

To be honest I don't have an explanation why results are *that* good, but
on the other hand - XRender is still clearly faster for 1x1 fillRect and
not *that* much slower for the maskfills.

> There are many ways to improve the OGL pipeline using new OGL feature,
but be aware that currently the OGL
> pipeline is the only one crossplatform pipeline which works virtually on
most of devices(new and old).
> A similar architecture issue exists in the d3d pipeline as well, so maybe
the Vulkan will better,
> it could replace XRender and D3D someday?
>
> I thought Vulkan superseded the OGL, no?

It did and for sure it would be a good/better path forward.
On the other hand it would require a new pipeline implementation and not
just incremental changes, something hard to do in spare-time ;)

Thanks and best regards, Clemens

Reply via email to