On 12/11/16 09:31 PM, Carsten Haitzler (The Rasterman) wrote: > hmmmm. i'm reading this code and it seems broken. > > let's start with the simple one. > > 1) > ecore_evas_wayland_common.c: this shouldnt MESS with manual render at all. at > most it should just have > > if (wdata->reset_pending) return 0; > > added to the top "not going to do this" section in > _ecore_evas_wl_common_render > () and remove all the other manual render things. ecore evas already forces > all > rendring to be driven by an animator out of the box. this animator should > either be totally driven by frame request events from wayland (compositor) > itself OR at least hook it up to the drm device... but either way this code is > wrong as it messes ith the manual render flag that is ALSO altered via api > from > ecore_evas "public api" calls which are called by elm win etc. > > so this has to go... it's wrong at a minimum and likely causing issues.
It's known broken and definitely needs to be fixed, as discussed in other email. I'll give it a higher priority. > 2) > ecore_evas_animator_tick() has a region but NO TIMESTAMP. please look at > ecore_x_vsync.c and specifically ecore_loop_time_set() that uses the vblank > timestamp to get EXACT vblank timing into. this is a must for having a STEADY > animator with STEADY timepoints. a region - fine... but you need a timestamp > too. Fixed in git. > now to the nasty one... > > 3) > ecore evas drm engine using pageflips for animation. errrr... WHA? no. well > ok. > tell me if i'm wrong. to be able to tick the animator the code DOES a page > flip... THEN uses the page flip event to drive the animator? what? this is > just > wrong on so many levels. animators are not a RESULT of altering the fb from > some previous page flip. they are a tick that says "the screen scanout is > refreshing NOW and you should time your animations NOW to THIS point". i > suspect this code is a huge source of brokenness. why are we scheduling a > flip ... we don't even HAVE a new frame yet? Usually we have a frame before we schedule because the first frame is frequently drawn before evas starts to "overtick"... drmModePageFlip(same buffer currently on screen) is similar in behaviour to drmWaitVBlank() in that it completes at the same with the same timestamp. However once you start the page flip you can't do another one until it completes. What would happen if we didn't have a new frame before calling drmWaitVBlank()? We wouldn't do anything until the following tick. These two methods should be equivalent at the start of animation? i know i know... it's "simpler" > and you can get a flip event per output... BUT firstly the timestamp is not > even USED from the page flip event in the code i'm reading Fixed in git by cedric. , secondly i have no > idea WHAT timestamp this would be (i know the vblank events are the vblank > interrupt timestamp). the pageflip COULD be some other time during scanout > (end > of vblank, end of scanout? something else?). No, it isn't. it's the same timestamp as vblank. Exactly the same timestamp you'd have received from drmWaitVBlank but either way it's not used and > should absolutely be. but ... in order to GET an event you have to do a > flip... > and i'm reading ecore_drm2_fb_flip() and that is seemingly what is being done > to just get this event. it seems to be able to do a "null flip" (flip the > current buffer to itself), but i'm not so sure this is reliable given the api > and code - is output-next always null when we register an animator with > _drm_animator_register() ? output->next will likely go away. It only exists for when render updates come in faster than we draw them - something that should not occur. > what happens when we do a null page flip THEN do a real one? You can't do a page flip while a page flip is running - so trying to flip again will set that frame as output->next. I haven't checked recently if we're still seeing any cases where a flip is requested during a flip. If there are no more bugs causing that to occur then output->next can go. It's currently there to ensure that if we get an "extra" frame (ie: one that asynchronously arrived and wasn't from a tick), and no following frame is provided, then the frame isn't lost. ie: last frame before the animator stops triggered asynchronously, and if we didn't make sure it was presented then the user could be staring at a stale frame indefinitely. > also here comes the stinker. what happens when we MISS a whole frame because > something went slow? we're scheduling from the mainloop. since we are asking > for the NEXT animator tick from a page flip... we only schedule a new page > flip > once we already have figured something changed in the ecore_evas or the > animator is registered etc... imagine for a second we have a 16ms frame gap > (approx 60hz). what if we spend 18ms this frame rendering or doing something? > we MISS the next opportunity to page flip, so we actually have to wait another > 14ms for the next frame. thus we do either 60hz, or 30hz... but can never play > catch up and squeeze in another frame because we missed the opportunity to > schedule one. this scheme using page flips is broken "by design" in this way. I can't follow you here - are you asking for tearing or are you asking for incorrect timing? Assuming page flips for presentation (ie: no tearing, ever), if the frame started at vblank N is not ready for scanout at N+1 then it can not be display until N+2 without tearing. There will be an unavoidable stutter in framerate, as we have nothing new for N+1. If I understand you correctly, you then want to use N+1's time as a tick then we're going to deliver that frame at N+3, so we've just ticked with an incorrect time, but we gave it a lot of slack time to render with. If frame times are consistently over the line then we'll have a juddered output which is likely more annoying to the average user than a consistent run of 30fps, and we'll have added 16ms of input latency to most, but not all, frames. If we only see intermittent spikes (say, render takes 10ms, 21ms, 10ms, 10ms) then we've merely added a timing error. We only get to "catch up" if we have something like 10ms, 19ms, 19ms, 10ms, as frame 3 is rendering (with a wrong time) during the missed opportunity to present frame 2, but it gets enough time to present. If we get 10ms, 18ms, 4ms, 10ms render times, then frame 3 is ready to display before frame 2 actually hits scanout. That was cpu time/battery well spent. This isn't really hiding the inability of the render engine to present frames fast enough, it's just making it manifest in a different way - and with more complicated code. If I've missed your point and you're actually asking for frame presentation at vblank N+1.5 that's obviously going to come with a jarring visual artifact and likely be impossible to do with atomic plane setting. > please have a look at ecore_x_vsync.c. yes it's complex. it does whitelisting > of working drivers and handles some broken drm info stuff. it dlopens libdrm > to > not have to link or compile against it at all. it also handles timeouts on drm > vblank events and after enough will switch to time based animation. but ignore > that... I have looked at that file in the past, it's very hard to read. It would be helpful if there were comments indicating why it does things like check if a timestamp is in the future. > it uses a dedicated thread JUST for scheduling vblank events and reading them > so it will never miss one. in the dedicated thread ALL it does is schedule a > vblank and sleep waiting for it to happen (or a command from mainloop to stop > doing events, timeout in case of screen going off to fake animation ticks > while > vblanks are mising or exit etc.). so it can never (on even a heavily loaded > system and irrespective of what the mainloop does) miss a frame... unless it I'm not sure I get your definition of "miss a frame" - you have an 80ms render stall on RPI3, is that not several missed frames? > literally cannot manage to wakeup from the fd coming alive. call > drmHandlEvent, > write the timestamp down an fd, check cancellation flag, the come back to > scheduling it again within 16ms. you'd have a fundamentally broken system if > this was not possible as this thread does nothing else. there is a good reason > it is done this way. > > i have found that we HAVE managed to lose frames and drop from 60 to 30hz and > the effects are not pretty. the drm engine really has to do the same. and i'm > not so sure you can use page flip events then as you have a completely > independent thread trying to use the drm fd to get these events and you don't > want IT flipping pages to the same or a new one... so you likely have to use a > separate drm fd like ecore_x_vsync.c does and rely on vblank events which are > passive. This is a huge amount of complexity to implement a very odd timing mechanism. :( > chances are this code should become common/shared. but what i am reading right > now smells of broken... and on my rpi i see broken... in fact the mouse just > doesn't ever (barely) render/move unless i hold it over an ibar icon that is > pulsing and animating... then it works, but move my mouse away... it basically That bug existed from Thursday last week to Monday this week. > doesn't work. i suspect this pageflip "to myself" trick is to blame. yes the > vblank code only keys off a single screen and will ignore all other vblank > events. i guess this would have to be figured out somehow. > On my RPI3 weston can't update the cursor plane at > 30fps when gl rendering, yet somehow manages that ok when run with --use-pixman (software render, no GL). That's just moving the cursor plane around - in neither case are any actual pixels being rendered! --use-pixman is also massively faster when resizing a window. (somehow weston-simple-egl manages 60fps if it's flipped directly to scanout though...) gl rendering sometimes draws flickers of crap in the cursor plane if any pixels are actually rendered. I'm using raspbian, and haven't done anything special except enable the vc4 mesa driver. I really don't think there's something we're doing to cause these performance drops you're seeing. There's just something wrong with the rpi3/raspbian. You're going to see broken on rpi3 because rpi3 is broken. It's as simple as that. How about we wait until the platform actually works before trying to optimize for it? ------------------------------------------------------------------------------ _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel