Peter Zijlstra wrote: > On Tue, 2009-02-17 at 16:59 -0800, Eric Anholt wrote: > >> The basic problem was >> mmap_sem (do_mmap()) -> struct_mutex (drm_gem_mmap(), i915_gem_fault()) >> struct_mutex (i915_gem_execbuffer()) -> mmap_sem (copy_from/to_user()) >> > > That's not the only problem, there's also: > > dup_mmap() > down_write(mmap_sem) > vm_ops->open() -> drm_vm_open() > mutex_lock(struct_mutex); > > >> We have plenty of places where we want to hold device state the same >> (struct_mutex) while we move a non-trivial amount of data >> (copy_from/to_user()), such as i915_gem_pwrite(). Solve this by moving the >> easy things that needed struct_mutex with mmap_sem held to using a lock to >> cover just those data structures (offset hash and offset manager), and do >> trylock and reschedule in fault. >> > > So we establish, > > mmap_sem > offset_mutex > > i915_gem_mmap_gtt_ioctl() > mutex_lock(struct_mutex) > i915_gem_create_mmap_offset() > mutex_lock(offset_mutex) > > However we still have > > struct_mutex > mmap_sem > > in basically every copy_*_user() case > > But you cannot seem to switch ->fault() to use offset_mutex, which would > work out the inversion because you then have: > > struct_mutex > mmap_sem > offset_mutex > > So why bother with the offset_mutex? Instead you make your ->fault() > fail randomly. > > I'm not sure what Wang Chen sees after this patch, but I should not be > the exact same splat, still it would not at all surprise me if there's > plenty left. > > The locking looks very fragile and I don't think this patch is helping > anything, sorry. > > It looks to me like the driver preferred locking order is
object_mutex (which happens to be the device global struct_mutex) mmap_sem offset_mutex. So if one could avoid using the struct_mutex for object bookkeeping (A separate lock) then vm_open() and vm_close() would adhere to that locking order as well, simply by not taking the struct_mutex at all. So only fault() remains, in which that locking order is reversed. Personally I think the trylock ->reschedule->retry method with proper commenting is a good solution. It will be the _only_ place where locking order is reversed and it is done in a deadlock-safe manner. Note that fault() doesn't really fail, but requests a retry from user-space with rescheduling to give the process holding the struct_mutex time to release it. /Thomas ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel