Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

Thomas Hellström Fri, 28 Aug 2009 14:29:05 -0700

Kristian Høgsberg skrev:
> This mail is getting out of control... too many sub-threads going
> on...  maybe we shold break it out and talk about events vs kernel
> scheduling and wait with the patch review until we've figured
> something out.
>   
Sure.



>>>> b) It requires the master to act as a scheduler, and circumvents the DRM
>>>> command submission mechanism through the delayed unpin callback. If this
>>>> is to workaround any inability of GEM to serve a command submission
>>>> while a previous command submission is blocked in the kernel, then IMHO
>>>> that should be fixed and not worked around.
>>>>
>>>>         
>>> It's not about workarounds.  Your suggestion *blocks the hw* while
>>> waiting for vsync.
>>>       
>> No it doesn't. It blocks dri clients when they try to render to old fronts.
>> Other dri clients would continue rendering. It provides a natural migration
>> path to triple buffering where automagically nothing is blocked, and also to
>> advanced software schedulers that can buffer command submissions instead of
>> blocking.
>>     
>
> You advocated blocking the command queue using a wait-for-vblank
> command at some point.
>   

I mentioned an example with how unichromes could handle pageflipping 
with vblank barriers yes, but also experessed concerns about the stall. 
That's not what I'm proposing. I sent a mail titled "Pageflipping 
scheduling" to avoid such misunderstandings a day or so ago. That  email 
details what I'm suggesting.

>   
>> Now the concern about GEM was that if the kernel takes a global mutex
>> _before_ blocking a client and doesn't release that mutex when the client is
>> blocked, all rendering will naturally be blocked as a consequence.
>>     
>
> That's not how the patch works.  We hold no locks while the client is blocked.
>   
I was not referring to your patch but to _kernel_ blocking with GEM. My 
concern was that GEM might block all clients instead of just the client 
trying to render to the old front. My initial question was whether the 
DRI2 blocking implementation was a workaround for that.

>> What my suggestion *does* is to block the X server if AIGLX tries to render
>> to the old front, and the command scheduler is simple. Now I'm not
>> completely sure how serious that is, given that AIGLX will, as stated
>> before, frequently block the X server anyway since it's not running in a
>> separate thread.
>>     
>
> Are you talking about an AIGLX client submitting an expensive shader
> locking up the X server for a long time?  I'm not aware of any way
> AIGLX can block the server for up to 20ms at a time right now, but
> that is what it'll do if it blocks on vsync.
>
>   
No, I was mostly referring to drivers doing swapbuffer throttling. A 
gpu-bound app at a low frame rate will certainly block for longer than that.
Anyway, regardless of this AIGLX is special in that there actually is a 
benefit of a drm event as an optimization hint if AIGLX

>>>> c) The implementation will mostly be worked around with capable
>>>> schedulers and hardware.
>>>>
>>>>         
>>> This is not about the capability of the hw or the sw.  The design is
>>> deliberate.
>>>
>>>       
>> What I mean is that if I don't want the kernel code to do delayed unpins,
>> because my cs already handles that, and I don't want the X server to block
>> clients because my cs or hardware will handle that, I would do my very best
>> to work around this code.
>>     
>
> I don't see the difference between a delayed unpin and a fence.  We're
> doing the same thing, why is it ok if we call it a fence?  You were
> saying that we should make the vsync interrupt look like a sw command
> queue and use that to fence the scan-out buffer right?  Is that really
> better?  I feel a bit like we're getting dragged back into the GEM vs
> TTM discussion here.  I have no stake in that battle, I'm just trying
> to work with what's there. I don't think there's a fundamental problem
> nor benefit with either, and what you can do with a TTM fence you can
> do by waiting on the right GEM BO, as far as I understand.
>   

Please, just forget about fences. They are one of the various ways to 
implement selective kernel blocking, and a good example. There are 
hopefully ways GEM can do this nicely as well. This has nothing to do 
with GEM vs TTM.

What I'm saying is If people have a way to handle this in hardware or in 
the kernel more or less for free. Then people will try to work around 
the code you are proposing. There is hardware that can send pageflips 
down the command FIFO and if there are multiple FIFOS that can accept 
pageflip commands makes software blocking completely unnecessary.

>>>> A couple of questions:
>>>> Why are you guys so reluctant to use kernel scheduling for this instead
>>>> of a mix of kernel / user-space scheduling?
>>>>
>>>>         
>>> I'm not opposed to kernel scheduling as well, but userspace needs this
>>> event.  I've made the case for AIGLX in the X server already, and
>>> direct rendering clients (for example, compositors) that want to
>>> handle input for as long as possible and avoid blocking has the same
>>> needs.
>>>
>>>       
>> I've still failed to understand this. Let's _assume_ for a while that the
>> kernel handles scheduling perfectly in a non-blocking fashion, or let's
>> assume we have triple-buffering, or let's assume we have a multi-FIFO card
>> that can do pageflipping and vblank barriers on all FIFOs.
>>
>> Why then exactly are events needed? and why are we required to track the
>> progress of the command fifo with events like jbarnes suggests, and finally
>> why is this mechanism not needed in the non-pageflipping case? If you can
>> give a typical use-case that would probably help a lot and avoid confusion.
>>     
>
> I think we have different applications in mind.  My guess is that
> you're thinking of a typical game workload that tries to render as
> many frames per second as possible.  What I have in mind is an
> application like a compositor or gui type application, where it
> doesn't use the hw much and doesn't spend much time rendering but
> needs to respond to input events and requests from clients.  What I'd
> like to know is, how do you design the main loop of that application
> so that it doesn't spend most of it's time blocked in some gl entry
> point waiting for the swap to finish.  It's not an application that's
> looking to queue up as many frames ahead as possible, that only
> introduces lag between the input events and what's on the screen.
>
> I'm not looking for "use threads" and I don't think there currently is
> a way to do this using just the OpenGL/GLX APIs.  I agree that aside
> from the case with AIGLX blocking the server, you're right, we don't
> need the event.  But what I'd like to do is to add a GLX extension
> that lets applications add a file descriptor to their main loop and
> that way discover when the flip is done.  That lets them stay in phase
> with the vsync, and provides a way to avoid spending most of their
> time blocked in an ioctl.  And as I've said before, I'm not opposed to
> doing the scheduling in the kernel, as long as we also get the event,
> so applications have a chance of knowing when they might block.
>   
OK. Now this makes much more sense.
Then I won't argue against the event mechanism as long as it's used as 
an optimization by clients to help avoid self-blocking, that should be 
turned on only when the client requests them.

But It shouldn't IMO be used by the X server to block dri clients, which 
means that rendering should be correct even if there are no kernel 
events. It should be up to the kernel to ensure that.

>   
>>>> If the plan is to eliminate DRI2GetBuffers() once per frame, what will
>>>> then be used to block clients rendering to the old back buffer?
>>>>         
>>> There'll be an event that's sent back after each DRI2SwapBuffer and
>>> the clients will block on receiving that event.  We still need to send
>>> a request to the xserver and receive confirmation that the xserver has
>>> received it before we can render again.
>>>       
>> The above is to make sure the swap is scheduled before any continued
>> rendering, right?
>>     
>
> Yes.
>
>   
>>>  DRI2GetBuffers is a request
>>> that expects a reply and will block the client on the xserver when we
>>> call it.  DRI2SwapBuffers is an async request, ie there's no reply and
>>> calling it wont block necessarily the client.  We still have to wait
>>> for the new event before we can go on rendering, but doing it this way
>>> makes the client and server less tightly coupled.  We may end up doing
>>> the roundtrip between client and server at a point where the client
>>> was going to block anyway (like disk i/o or something) saving a
>>> context switch.
>>>
>>>
>>>       
>> Hmm. I don't understand fully. So up to now, my picture of how a frame was
>> rendered looks like this.
>>
>> swapBuffers();
>> if (check_for_needed_getbuffers())
>>  getbuffers();
>> render();
>> swapBuffers();
>>
>> This is one X call per frame in the steady-state case. Now, where do you add
>> the dri2 pageflip throttling if we don't need to call getbuffers()? Is it in
>> check_for_needed_getbuffers()?
>>     
>
> I described this in more detail and hopefully more coherently in my
> email to Michel.  If that's still not clear, follow up there.
>
>   
I've read the mail and understand the proposal, thanks.

/Thomas



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

Reply via email to