On 12/11/16 09:31 PM, Carsten Haitzler (The Rasterman) wrote:
> hmmmm. i'm reading this code and it seems broken.
>
> let's start with the simple one.
>
> 1)
> ecore_evas_wayland_common.c: this shouldnt MESS with manual render at all. at
> most it should just have
>
>    if (wdata->reset_pending) return 0;
>
> added to the top "not going to do this" section in 
> _ecore_evas_wl_common_render
> () and remove all the other manual render things. ecore evas already forces 
> all
> rendring to be driven by an animator out of the box. this animator should
> either be totally driven by frame request events from wayland (compositor)
> itself OR at least hook it up to the drm device... but either way this code is
> wrong as it messes ith the manual render flag that is ALSO altered via api 
> from
> ecore_evas "public api" calls which are called by elm win etc.
>
> so this has to go... it's wrong at a minimum  and likely causing issues.

It's known broken and definitely needs to be fixed, as discussed in 
other email.  I'll give it a higher priority.

> 2)
> ecore_evas_animator_tick() has a region but NO TIMESTAMP. please look at
> ecore_x_vsync.c and specifically ecore_loop_time_set() that uses the vblank
> timestamp to get EXACT vblank timing into. this is a must for having a STEADY
> animator with STEADY timepoints. a region - fine... but you need a timestamp
> too.

Fixed in git.

> now to the nasty one...
>
> 3)
> ecore evas drm engine using pageflips for animation. errrr... WHA? no. well 
> ok.
> tell me if i'm wrong. to be able to tick the animator the code DOES a page
> flip... THEN uses the page flip event to drive the animator? what? this is 
> just
> wrong on so many levels. animators are not a RESULT of altering the fb from
> some previous page flip. they are a tick that says "the screen scanout is
> refreshing NOW and you should time your animations NOW to THIS point". i
> suspect this code is a huge source of brokenness. why are we scheduling a
> flip ... we don't even HAVE a new frame yet?

Usually we have a frame before we schedule because the first frame is 
frequently drawn before evas starts to "overtick"...

drmModePageFlip(same buffer currently on screen)
is similar in behaviour to
drmWaitVBlank()

in that it completes at the same with the same timestamp.  However once 
you start the page flip you can't do another one until it completes.

What would happen if we didn't have a new frame before calling 
drmWaitVBlank()?  We wouldn't do anything until the following tick. 
These two methods should be equivalent at the start of animation?

  i know i know... it's "simpler"
> and you can get a flip event per output... BUT firstly the timestamp is not
> even USED from the page flip event in the code i'm reading

Fixed in git by cedric.

, secondly i have no
> idea WHAT timestamp this would be (i know the vblank events are the vblank
> interrupt timestamp). the pageflip COULD be some other time during scanout 
> (end
> of vblank, end of scanout? something else?).

No, it isn't.  it's the same timestamp as vblank.  Exactly the same 
timestamp you'd have received from drmWaitVBlank

  but either way it's not used and
> should absolutely be. but ... in order to GET an event you have to do a 
> flip...
> and i'm reading ecore_drm2_fb_flip() and that is seemingly what is being done
> to just get this event. it seems to be able to do a "null flip" (flip the
> current buffer to itself), but i'm not so sure this is reliable given the api
> and code - is output-next always null when we register an animator with
> _drm_animator_register() ?

output->next will likely go away.  It only exists for when render 
updates come in faster than we draw them - something that should not occur.

> what happens when we do a null page flip THEN do a real one?

You can't do a page flip while a page flip is running - so trying to 
flip again will set that frame as output->next.

I haven't checked recently if we're still seeing any cases where a flip 
is requested during a flip.  If there are no more bugs causing that to 
occur then output->next can go.

It's currently there to ensure that if we get an "extra" frame (ie: one 
that asynchronously arrived and wasn't from a tick), and no following 
frame is provided, then the frame isn't lost.  ie: last frame before the 
animator stops triggered asynchronously, and if we didn't make sure it 
was presented then the user could be staring at a stale frame indefinitely.

> also here comes the stinker. what happens when we MISS a whole frame because
> something went slow? we're scheduling from the mainloop. since we are asking
> for the NEXT animator tick from a page flip... we only schedule a new page 
> flip
> once we already have figured something changed in the ecore_evas or the
> animator is registered etc... imagine for a second we have a 16ms frame gap
> (approx 60hz). what if we spend 18ms this frame rendering or doing something?
> we MISS the next opportunity to page flip, so we actually have to wait another
> 14ms for the next frame. thus we do either 60hz, or 30hz... but can never play
> catch up and squeeze in another frame because we missed the opportunity to
> schedule one. this scheme using page flips is broken "by design" in this way.

I can't follow you here - are you asking for tearing or are you asking 
for incorrect timing?

Assuming page flips for presentation (ie: no tearing, ever), if the 
frame started at vblank N is not ready for scanout at N+1 then it can 
not be display until N+2 without tearing.  There will be an unavoidable 
stutter in framerate, as we have nothing new for N+1.

If I understand you correctly, you then want to use N+1's time as a tick 
then we're going to deliver that frame at N+3, so we've just ticked with 
an incorrect time, but we gave it a lot of slack time to render with.

If frame times are consistently over the line then we'll have a juddered 
output which is likely more annoying to the average user than a 
consistent run of 30fps, and we'll have added 16ms of input latency to 
most, but not all, frames.

If we only see intermittent spikes (say, render takes 10ms, 21ms, 10ms, 
10ms) then we've merely added a timing error.

We only get to "catch up" if we have something like 10ms, 19ms, 19ms, 
10ms, as frame 3 is rendering (with a wrong time) during the missed 
opportunity to present frame 2, but it gets enough time to present.

If we get 10ms, 18ms, 4ms, 10ms render times, then frame 3 is ready to 
display before frame 2 actually hits scanout.  That was cpu time/battery 
well spent.

This isn't really hiding the inability of the render engine to present 
frames fast enough, it's just making it manifest in a different way - 
and with more complicated code.

If I've missed your point and you're actually asking for frame 
presentation at vblank N+1.5 that's obviously going to come with a 
jarring visual artifact and likely be impossible to do with atomic plane 
setting.

> please have a look at ecore_x_vsync.c. yes it's complex. it does whitelisting
> of working drivers and handles some broken drm info stuff. it dlopens libdrm 
> to
> not have to link or compile against it at all. it also handles timeouts on drm
> vblank events and after enough will switch to time based animation. but ignore
> that...

I have looked at that file in the past, it's very hard to read.  It 
would be helpful if there were comments indicating why it does things 
like check if a timestamp is in the future.

> it uses a dedicated thread JUST for scheduling vblank events and reading them
> so it will never miss one. in the dedicated thread ALL it does is schedule a
> vblank and sleep waiting for it to happen (or a command from mainloop to stop
> doing events, timeout in case of screen going off to fake animation ticks 
> while
> vblanks are mising or exit etc.). so it can never (on even a heavily loaded
> system and irrespective of what the mainloop does) miss a frame... unless it

I'm not sure I get your definition of "miss a frame" - you have an 80ms 
render stall on RPI3, is that not several missed frames?

> literally cannot manage to wakeup from the fd coming alive. call 
> drmHandlEvent,
> write the timestamp down an fd, check cancellation flag, the come back to
> scheduling it again within 16ms. you'd have a fundamentally broken system if
> this was not possible as this thread does nothing else. there is a good reason
> it is done this way.
>
> i have found that we HAVE managed to lose frames and drop from 60 to 30hz and
> the effects are not pretty. the drm engine really has to do the same. and i'm
> not so sure you can use page flip events then as you have a completely
> independent thread trying to use the drm fd to get these events and you don't
> want IT flipping pages to the same or a new one... so you likely have to use a
> separate drm fd like ecore_x_vsync.c does and rely on vblank events which are
> passive.

This is a huge amount of complexity to implement a very odd timing 
mechanism. :(

> chances are this code should become common/shared. but what i am reading right
> now smells of broken... and on my rpi i see broken... in fact the mouse just
> doesn't ever (barely) render/move unless i hold it over an ibar icon that is
> pulsing and animating... then it works, but move my mouse away... it basically

That bug existed from Thursday last week to Monday this week.

> doesn't work. i suspect this pageflip "to myself" trick is to blame. yes the
> vblank code only keys off a single screen and will ignore all other vblank
> events. i guess this would have to be figured out somehow.
>

On my RPI3 weston can't update the cursor plane at > 30fps when gl 
rendering, yet somehow manages that ok when run with --use-pixman 
(software render, no GL).  That's just moving the cursor plane around - 
in neither case are any actual pixels being rendered!

--use-pixman is also massively faster when resizing a window.
(somehow weston-simple-egl manages 60fps if it's flipped directly to 
scanout though...)

gl rendering sometimes draws flickers of crap in the cursor plane if any 
pixels are actually rendered.

I'm using raspbian, and haven't done anything special except enable the 
vc4 mesa driver.

I really don't think there's something we're doing to cause these 
performance drops you're seeing.  There's just something wrong with the 
rpi3/raspbian.  You're going to see broken on rpi3 because rpi3 is 
broken.  It's as simple as that.

How about we wait until the platform actually works before trying to 
optimize for it?

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to