On 2018-01-14 12:37 AM, Carsten Haitzler wrote:
On Sat, 13 Jan 2018 09:06:49 -0600 Derek Foreman <der...@osg.samsung.com> said:
<Heavily trimmed, as I wish to avoid most of the technical and
non-technical debate and address one point>
On 2018-01-13 12:37 AM, Carsten Haitzler wrote:
On Fri, 12 Jan 2018 19:48:15 +0000 Mike Blumenkrantz
<michael.blumenkra...@gmail.com> said:
On Fri, Jan 12, 2018 at 9:45 AM Carsten Haitzler <ras...@rasterman.com>
wrote:
On Fri, 12 Jan 2018 13:48:46 +0000 Stephen Houston <smhousto...@gmail.com>
said:
Both of these cases are solved by rendering the buffers (both gl and
software) directly in the outer compositor; see
https://phab.enlightenment.org/T6592 on the gadgets workboard for this task
which is already nearing completion.
that doesn't change anything. some work somewhere has to either copy the
data to video memory every change OR the the gpu will likely have to
composite and thus access data over a bus (eg over pci bus). if it's an
embedded system all memory is equal, but subsurface are going to unlikely
help because you'll not have enough hw layers to assign to a host of
gadgets. there will generally maybe be 2-5 ... maybe on a good day 8 or 10
layers, and these are far better used for application windows (e.g.
separating desktop, active/focused window or window being moved/dragged,
etc. etc.). it's still a gadget rendering to a buffer THEN that buffer
having to be copied again ... always. at all times.
Hey Raster,
The trick here is that a wl_shm or dmabuf will never actually be
rendered by the *nested* compositor.
someone has to render it to the final buffer/screen. either the MAIN compositor
has to do this or it's assigned to a hw layer (thus a hardware compositor does
it). in the end it adds up to an "always there extra buffer or set of buffers
that are swapped". with modules where e is doing all the rendering there is no
such buffer (unless we use map or proxies and force it as a choice). i'm just
pointing out that there is a cost and it's non-free. but it's also "always
there", as opposed to if rendering is done in enlightenment where it can be a
choice turned on and off at will (or on the fly).
The nested compositor will simply created a proxy buffer and use the
existing "hardware" plane infrastructure to place this on a subsurface
(I'm calling these virtual hardware planes). It won't even open the
file descriptors, it'll just create parent compositor objects for them
and proxy along placement and damage requests.
yup. i got that... but i directly addressed that. the rendering pipeline here
in a wayland sense is:
COPY src texture/image data -> client visible buffer
0 buffer swapped/sent to compositor (zero copy)
COPY compositor read buffer -> write/render to compositor backbuffer
0 compositor display backbuffer
Yes, true, wl_shm will always take that hit.
Any time we can render into dmabuf instead we can then source the buffer
directly as a texture if the compositor is using gl, and burn fill rate
instead of cpu cycles.
This is all damage tracked, of course, for "minimal" updates - but it
is, as you say, not 100% efficient for bandwidth or for RAM.
What we will lose is:
COPY src texture/image data -> client visible buffer
0 buffer swapped/sent to compositor (zero copy)
COPY nested compositor renders
0 nested compositor buffer swapped/sent to compositor
COPY compositor read buffer -> write/render to compositor backbuffer
0 compositor display backbuffer
where gadgets inside e are:
COPY src texture/image data -> write/render to compositor backbuffer
0 compositor display backbuffer
if you have enough hw layers for each and every gadget buffer (which i doubt
given that mostly hw has maybe 2 or 3 layers, sometimes 5 or maybe 8), those
layers would be far more effectively spent on mouse cursor and focused client
windows etc... thus the first pipeline is what will be really happening. yes.
if the buffer gets assigned to a hw layer it's
COPY src texture/image data -> client visible buffer
0 buffer swapped/sent to compositor (zero copy)
0 compositor assigns buffer to hw layer
though hw layers aren't totally free either. the hw is dma scanning them every
refresh to the screeen in addition to all other hw layers... :) it depends
where your memory is and so on as to how this impacts things.
Right - and most non-rpi hardware I've had my hands on appears to
support 1 small ARGB plane (cursor) and 1 XRGB capable plane (intended
for video, also generally supports YUV stuffs).
The lack of ARGB on the full size plane makes it useless for general
clients, as they usually want to render drop shadows...
There are a few technical hurdles and @optimizations required to get us
there, but that's the tl;dr.
When finished, this should prevent all "double draw" performance losses
as well as allowing some surprising tricks like GL clients proxying
dmabuf GPU rendered buffers through a SW rendering nested compositor to
a GL rendering parent compositor, which can directly use them as textures.
Thanks,
Derek
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel