On 15.01.2021 00:32, Clemens Eisserer wrote:
With solid OpenGL support on Linux being ubiquitous these days and the XRender 
pipeline being a bit of a dead-end (works quite well except 
MaskBlit/MaskFill/BufferedImageOps), I was looking a bit into the 
state/performance of the OpenGL pipeline.
Specifically why it performs sometimes worse compared to XRender, despite 
almost all XRender implementations are running on top of OpenGL these days 
anyway (except proprietary nvidia).

1. One area where XRender is having an advantage is implicit parallelism.
While Java is producing X11 protocol, the XServer can concurrently perform the 
drawing operations running on a different core.
Therefore when running some Swing benchmarks with xrender enabled I see java 
consuming 100% of one core, while the XServer consumes ~50% of another one.
With the OpenGL pipeline on the other hand, just one core is fully loaded - 
despite a similar design (one flusher thread calling into OpenGL, and one or 
more independent threads queuing drawing operations into a buffer).

The design is a little bit different, the XRender pipeline:
 - Many threads which call draw operations and sends them to the XServer
 - The XSerever decode the the draw operation and push the commands to the "opengl 
buffer"
 - Some video driver code which decode the ogl buffer and actually draw 
something
The OGL pipeline
 - Many threads call draw operations, and save them in the internal RenderQueue 
buffer
 - One RenderQueue thread decode the the draw operation and push the commands to the 
"opengl buffer"
 - Some video driver code which decode the ogl buffer and actually draw 
something

Depends on the method on how to measure the performance results will be 
different.
 - For the screen rendering you will need to check the screen, is something is 
really rendering(glflush and XFlush could help a little bit but not always)
 - For the volatile image you will need to test roundtrip when you write and 
read data

The only obviously bad results could occur if some pipeline will use the 
software rendering(even for some small intermittent operations)

The reason seems to be the OGLRenderQueue has just one buffer, so either the 
flusher thread is active or a queuing thread but not both.
I wonder, have there been attempts made to introduce double-buffering here and 
have producers (awt threads) and consumer (queue flusher thread) running 
concurrently here?

I am not sure that it work that way, the only way to block the queuing thread is to call 
"flushNow()", for other cases it should not be blocked.
The OGLRenderQueue thread however could became blocked every 100ms.


2. Especially MaskFill did perform quite poor in my tests, because drivers are 
typically not optimized for tiny texture uploads (32x32 coverage masks).
Just stubbing out the subTex2D call improved framerate of one benchmark from 
100fps to 300fps.
I have done some prototyping uploading coverage masks via 
Shader_Storage_Buffer_Object, but that requires  
ARB_shader_storage_buffer_object (GL 4.3) as well glBufferStorage (4.4), so 
effectivly OpenGL-4.4.
On the pro side of this approach composition peaked at about 10GB/s with 64x64 
mask tiles.

There are many ways to improve the OGL pipeline using new OGL feature, but be 
aware that currently the OGL
pipeline is the only one crossplatform pipeline which works virtually on most 
of devices(new and old).
A similar architecture issue exists in the d3d pipeline as well, so maybe the 
Vulkan will better,
it could replace XRender and D3D someday?

I thought Vulkan superseded the OGL, no?


Which leads me to the next question:
The pipeline currently is written in rather ancient OpenGL-2.x style.
Once the need to support old OSX versions is gone, the OpenGL pipeline would 
not be enabled be default on any platform.
Once this is the case, would it be feasable to clean up the code by making use 
of modern GL and by leaving legacy platforms behind?
Modern GL has many improvements in areas where the current pipeline is 
suffering a bit (overhead of state changes for small drawing operations, etc).


--
Best regards, Sergey.

Reply via email to