Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

Thomas Hellström Fri, 28 Aug 2009 01:17:13 -0700

Kristian Høgsberg skrev:
> 2009/8/27 Thomas Hellström <tho...@shipmail.org>:
>   
>> Kristian Høgsberg skrev:
>>     
>>> ---
>>>
>>> Okay, here's the patch again with the locking fixed.  Everything else is
>>> the same.  As for the potential problem with polling on the drm fd with
>>> both dri1 and dri2 enabled, I suggest that we just disable the dri1
>>> sigio handler in the xserver.  We'll do this in a commit that precedes the
>>> merge of the page flipping code, so that no server will be able to poll
>>> on the drm fd in both dri1 and dri2 code.
>>>
>>> cheers,
>>> Kristian
>>>
>>>       
>> <Grr>
>>
>> 1) The locking problems are still not fixed. See below. I just did a
>> quick look, so the comment list is not complete.
>>     
>
> It's not as bad as you think.  There is one glitch where I changed
> struct_mutex to mode_config.mutex and left dev->flip_list unprotected.
>  The other issues you comment on should be fine, I've followed up in
> detail below.
>
>


I actually think it is. See below.

>
>   
>> </Grr>
>>
>> Even if that is fixed, I believe this patch adds a framework that
>> doesn't belong in a modern day graphical subsystem for the following
>> reasons, many of them already mentioned, but let me reiterate. Although
>> the dri1 compatibility issue will be fixed with your suggestion.
>>
>> a) It puts requirements on user-space for correct functionality of DRM
>> ioctls. I'm talking about the master needing to parse the event queue.
>>     
>
> Don't we already have a lot of requirements on userspace to use the
> DRM?  It's not exactly trivial to prepare a execbuffer ioctl, or to
> find a connector, crtc and mode and set that, for example.  Part of
> this work is a new libdrm entry point to read an parse the events from
> the fd.  See the dri2-swapbuffer branch in drm.git.
>
>   


>> b) It requires the master to act as a scheduler, and circumvents the DRM
>> command submission mechanism through the delayed unpin callback. If this
>> is to workaround any inability of GEM to serve a command submission
>> while a previous command submission is blocked in the kernel, then IMHO
>> that should be fixed and not worked around.
>>     
>
> It's not about workarounds.  Your suggestion *blocks the hw* while
> waiting for vsync.
No it doesn't. It blocks dri clients when they try to render to old 
fronts. Other dri clients would continue rendering. It provides a 
natural migration path to triple buffering where automagically nothing 
is blocked, and also to advanced software schedulers that can buffer 
command submissions instead of blocking.

Now the concern about GEM was that if the kernel takes a global mutex 
_before_ blocking a client and doesn't release that mutex when the 
client is blocked, all rendering will naturally be blocked as a consequence.

What my suggestion *does* is to block the X server if AIGLX tries to 
render to the old front, and the command scheduler is simple. Now I'm 
not completely sure how serious that is, given that AIGLX will, as 
stated before, frequently block the X server anyway since it's not 
running in a separate thread.

>> c) The implementation will mostly be worked around with capable
>> schedulers and hardware.
>>     
>
> This is not about the capability of the hw or the sw.  The design is 
> deliberate.
>   

What I mean is that if I don't want the kernel code to do delayed 
unpins, because my cs already handles that, and I don't want the X 
server to block clients because my cs or hardware will handle that, I 
would do my very best to work around this code.

>> A couple of questions:
>> Why are you guys so reluctant to use kernel scheduling for this instead
>> of a mix of kernel / user-space scheduling?
>>     
>
> I'm not opposed to kernel scheduling as well, but userspace needs this
> event.  I've made the case for AIGLX in the X server already, and
> direct rendering clients (for example, compositors) that want to
> handle input for as long as possible and avoid blocking has the same
> needs.
>   

I've still failed to understand this. Let's _assume_ for a while that 
the kernel handles scheduling perfectly in a non-blocking fashion, or 
let's assume we have triple-buffering, or let's assume we have a 
multi-FIFO card that can do pageflipping and vblank barriers on all FIFOs.

Why then exactly are events needed? and why are we required to track the 
progress of the command fifo with events like jbarnes suggests, and 
finally why is this mechanism not needed in the non-pageflipping case? 
If you can give a typical use-case that would probably help a lot and 
avoid confusion.

>   
>> If the plan is to eliminate DRI2GetBuffers() once per frame, what will
>> then be used to block clients rendering to the old back buffer?
>>     
>
> There'll be an event that's sent back after each DRI2SwapBuffer and
> the clients will block on receiving that event.  We still need to send
> a request to the xserver and receive confirmation that the xserver has
> received it before we can render again. 

The above is to make sure the swap is scheduled before any continued 
rendering, right?

>  DRI2GetBuffers is a request
> that expects a reply and will block the client on the xserver when we
> call it.  DRI2SwapBuffers is an async request, ie there's no reply and
> calling it wont block necessarily the client.  We still have to wait
> for the new event before we can go on rendering, but doing it this way
> makes the client and server less tightly coupled.  We may end up doing
> the roundtrip between client and server at a point where the client
> was going to block anyway (like disk i/o or something) saving a
> context switch.
>
>   

Hmm. I don't understand fully. So up to now, my picture of how a frame 
was rendered looks like this.

swapBuffers();
if (check_for_needed_getbuffers())
   getbuffers();
render();
swapBuffers();

This is one X call per frame in the steady-state case. Now, where do you 
add the dri2 pageflip throttling if we don't need to call getbuffers()? 
Is it in check_for_needed_getbuffers()?



>>
>>     
>>>  drivers/gpu/drm/drm_crtc.c              |  171 
>>> ++++++++++++++++++++++++++++++-
>>>  drivers/gpu/drm/drm_crtc_helper.c       |   10 ++
>>>  drivers/gpu/drm/drm_drv.c               |    1 +
>>>  drivers/gpu/drm/drm_fops.c              |   68 ++++++++++++-
>>>  drivers/gpu/drm/drm_irq.c               |   42 ++++++++
>>>  drivers/gpu/drm/i915/i915_drv.c         |    1 +
>>>  drivers/gpu/drm/i915/intel_display.c    |   16 +++-
>>>  drivers/gpu/drm/radeon/radeon_display.c |    3 +-
>>>  include/drm/drm.h                       |   25 +++++
>>>  include/drm/drmP.h                      |   32 ++++++
>>>  include/drm/drm_crtc.h                  |   27 +++++
>>>  include/drm/drm_crtc_helper.h           |    4 +
>>>  include/drm/drm_mode.h                  |   16 +++
>>>  13 files changed, 409 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
>>> index 8fab789..d877c21 100644
>>> --- a/drivers/gpu/drm/drm_crtc.c
>>> +++ b/drivers/gpu/drm/drm_crtc.c
>>> @@ -34,6 +34,8 @@
>>>  #include "drmP.h"
>>>  #include "drm_crtc.h"
>>>
>>> +#undef set_base
>>> +
>>>  struct drm_prop_enum_list {
>>>       int type;
>>>       char *name;
>>> @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer 
>>> *fb)
>>>  EXPORT_SYMBOL(drm_framebuffer_cleanup);
>>>
>>>  /**
>>> + * drm_crtc_async_flip - do a set_base call from a work queue
>>> + * @work: work struct
>>> + *
>>> + * Called when a set_base call is queued by the page flip code.  This
>>> + * allows the flip ioctl itself to return immediately and allow userspace
>>> + * to continue working.
>>> + */
>>> +static void drm_crtc_async_flip(struct work_struct *work)
>>> +{
>>> +     struct drm_crtc *crtc = container_of(work, struct drm_crtc, 
>>> async_flip);
>>> +     struct drm_device *dev = crtc->dev;
>>> +     struct drm_pending_flip *pending;
>>> +
>>> +     mutex_lock(&dev->mode_config.mutex);
>>> +
>>> +     BUG_ON(crtc->pending_flip == NULL);
>>> +
>>> +     crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL);
>>> +
>>> +     pending = crtc->pending_flip;
>>> +     crtc->pending_flip = NULL;
>>> +
>>> +     pending->frame = drm_vblank_count(dev, crtc->pipe);
>>>
>>>       
>> What's protecting dev->flip_list here?
>>     
>
> That should be struct_mutex.  As mentioned above, I missed it when I
> changed the mutex to mode_config.mutex.
>   

Hence the locking order question. If there is a need to keep the 
mode_config mutex held _as well_ across the list insertion, since 
crtc->pending_flip.link is modified at list insertion. That really 
depends on what's protecting that member.

>   
>>> +     list_add_tail(&pending->link, &dev->flip_list);
>>> +
>>> +     mutex_unlock(&dev->mode_config.mutex);
>>> +}
>>> +
>>>
>>> +
>>>       
>>>  #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls )
>>> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
>>> index 251bc0e..dcd9c66 100644
>>> --- a/drivers/gpu/drm/drm_fops.c
>>> +++ b/drivers/gpu/drm/drm_fops.c
>>> @@ -257,6 +257,8 @@ static int drm_open_helper(struct inode *inode, struct 
>>> file *filp,
>>>
>>>       INIT_LIST_HEAD(&priv->lhead);
>>>       INIT_LIST_HEAD(&priv->fbs);
>>> +     INIT_LIST_HEAD(&priv->event_list);
>>> +     init_waitqueue_head(&priv->event_wait);
>>>
>>>       if (dev->driver->driver_features & DRIVER_GEM)
>>>               drm_gem_open(dev, priv);
>>> @@ -429,6 +431,9 @@ int drm_release(struct inode *inode, struct file *filp)
>>>  {
>>>       struct drm_file *file_priv = filp->private_data;
>>>       struct drm_device *dev = file_priv->minor->dev;
>>> +     struct drm_pending_flip *f, *ft;
>>> +     struct drm_pending_event *e, *et;
>>> +
>>>       int retcode = 0;
>>>
>>>       lock_kernel();
>>> @@ -451,6 +456,19 @@ int drm_release(struct inode *inode, struct file *filp)
>>>       if (file_priv->minor->master)
>>>               drm_master_release(dev, filp);
>>>
>>>
>>>       
>> And here? What's the locking order between mode_config mutex and the
>> struct_mutex?
>>     
>
> Here it's protected by struct_mutex as intended.  What is your concern
> about the locking order?
>
>   

See above.

>>> +     mutex_lock(&dev->struct_mutex);
>>> +
>>> +     /* Remove pending flips */
>>> +     list_for_each_entry_safe(f, ft, &dev->flip_list, link)
>>> +             if (f->pending_event.file_priv == file_priv)
>>> +                     drm_finish_pending_flip(dev, f, 0);
>>> +
>>> +     /* Remove unconsumed events */
>>> +     list_for_each_entry_safe(e, et, &file_priv->event_list, link)
>>> +             e->destroy(e);
>>> +
>>> +     mutex_unlock(&dev->struct_mutex);
>>> +
>>>       if (dev->driver->driver_features & DRIVER_GEM)
>>>               drm_gem_release(dev, file_priv);
>>>
>>> @@ -544,9 +562,55 @@ int drm_release(struct inode *inode, struct file *filp)
>>>  }
>>>  EXPORT_SYMBOL(drm_release);
>>>
>>> -/** No-op. */
>>> +ssize_t drm_read(struct file *filp, char __user *buffer,
>>> +              size_t count, loff_t *offset)
>>> +{
>>> +     struct drm_file *file_priv = filp->private_data;
>>> +     struct drm_device *dev = file_priv->minor->dev;
>>> +     struct drm_pending_event *event;
>>> +     ssize_t total, ret;
>>> +
>>> +     ret = wait_event_interruptible(file_priv->event_wait,
>>> +                                    !list_empty(&file_priv->event_list));
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     total = 0;
>>>
>>>       
>> What's protecting file_priv->event_list here?
>>     
>
> You can test for list emptiness without taking the lock.
>   

Are you suggesting accessing a member of a mutex protected struct 
without taking the mutex?
Do you have a pointer to where what you say above is documented?

That would be highly architecture dependent and in this particular case 
require the processor being capable of atomic pointer reads and writes, 
no processor r/w reordering and the compiler assuming the list head 
being declared volatile.

Even if that's the case let's assume list_empty() returns true, and then 
another thread sneaks in and empties the queue before you take the 
mutex. The next thing you do is to access and try to remove "event", 
which doesn't exist because the list is empty. If you're lucky you'll 
see an oops. If not, you end up with strange memory corruption.

>   
>>> +     while (!list_empty(&file_priv->event_list)) {
>>> +             mutex_lock(&dev->struct_mutex);
>>> +             event = list_first_entry(&file_priv->event_list,
>>> +                                      struct drm_pending_event, link);
>>> +             if (total + event->event->length > count) {
>>> +                     mutex_unlock(&dev->struct_mutex);
>>> +                     break;
>>> +             }
>>> +             list_del(&event->link);
>>> +             mutex_unlock(&dev->struct_mutex);
>>> +
>>> +             if (copy_to_user(buffer + total,
>>> +                              event->event, event->event->length)) {
>>> +                     total = -EFAULT;
>>> +                     break;
>>> +             }
>>> +
>>> +             total += event->event->length;
>>> +             event->destroy(event);
>>> +     }
>>> +
>>> +     return total;
>>> +}
>>> +EXPORT_SYMBOL(drm_read);
>>> +
>>>  unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait)
>>>  {
>>> -     return 0;
>>> +     struct drm_file *file_priv = filp->private_data;
>>> +     unsigned int mask = 0;
>>> +
>>> +     poll_wait(filp, &file_priv->event_wait, wait);
>>> +
>>>
>>>       
>> And here?
>>     
>
> Again, testing list emptiness.
>
>   

See above. Although here, the consequenses probably aren't that fatal.

>>> +     if (!list_empty(&file_priv->event_list))
>>> +             mask |= POLLIN | POLLRDNORM;
>>> +
>>> +     return mask;
>>>  }
>>>  EXPORT_SYMBOL(drm_poll);
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index b4a3dbc..a0c120c 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -34,6 +34,7 @@
>>>   */
>>>
>>>  #include "drmP.h"
>>> +#include "drm_crtc_helper.h"
>>>
>>>  #include <linux/interrupt.h> /* For task queue support */
>>>
>>> @@ -71,6 +72,44 @@ int drm_irq_by_busid(struct drm_device *dev, void *data,
>>>       return 0;
>>>  }
>>>
>>> +#define vblank_passed(a,b) ((long)(a - b) > 0)
>>> +
>>> +void drm_finish_pending_flip(struct drm_device *dev,
>>> +                          struct drm_pending_flip *f, u32 frame)
>>> +{
>>> +     struct timeval now;
>>> +
>>> +     f->event.frame = frame;
>>> +     do_gettimeofday(&now);
>>> +     f->event.tv_sec = now.tv_sec;
>>> +     f->event.tv_usec = now.tv_usec;
>>> +     drm_vblank_put(dev, f->pipe);
>>> +     list_del_init(&f->link);
>>>
>>>       
>> What's protecting the file_priv->event_list here?
>>     
>
> This function is always called with the struct_mutex held.
>   

No. It's not. It's called with the mode_config mutex held.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

Reply via email to