Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2

2020-11-19 Thread Thomas Gleixner
On Thu, Nov 19 2020 at 13:25, Chris Wilson wrote:
> Quoting Peter Zijlstra (2020-11-19 13:02:44)
>> 
>> Chris, I suspect this is due to i915 calling stop machine with all sorts
>> of locks held. Is there anything to be done about this? stop_machine()
>> is really nasty to begin with.
>> 
>> What problem is it typing to solve?
>
> If there is any concurrent access through a PCI bar (that is exported to
> userspace via mmap) as the GTT is updated, results in undefined HW
> behaviour (where that is not limited to users writing to other system
> pages).
>
> stop_machine() is the most foolproof method we know that works.

It's also the biggest hammer and is going to cause latencies just
because even on CPUs which are not involved at all. We have already
enough trouble vs. WBINVD latency wise, so no need to add yet another
way to hurt everyone.

As the gfx muck knows which processes have stuff mapped, there are
certainly ways to make them and only them rendevouz and do so while
staying preemptible otherwise. It might take an RESCHED_IPI to all CPUs
to achieve that, but that's a cheap operation compared to what you want
to do.

Thanks,

tglx




Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2

2020-11-19 Thread Peter Zijlstra
On Thu, Nov 19, 2020 at 03:19:14PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 01:25:11PM +, Chris Wilson wrote:
> > Quoting Peter Zijlstra (2020-11-19 13:02:44)
> > > 
> > > Chris, I suspect this is due to i915 calling stop machine with all sorts
> > > of locks held. Is there anything to be done about this? stop_machine()
> > > is really nasty to begin with.
> > > 
> > > What problem is it typing to solve?
> > 
> > If there is any concurrent access through a PCI bar (that is exported to
> > userspace via mmap) as the GTT is updated, results in undefined HW
> > behaviour (where that is not limited to users writing to other system
> > pages).
> > 
> > stop_machine() is the most foolproof method we know that works.
> 
> Sorry, I don't understand. It tries to do what? And why does it need to
> do that holding locks.
> 
> Really, this is very bad form.

Having poked around at the code; do I get it correct that this is using
stop-machine to set IOMMU page-table entries, because the hardware
cannot deal with two CPUs writing to the same device page-tables; which
would be possible because that memory is exposed through PCI bars?

Can't you simply exclude that memory from being visible through the PCI
bar crud? Having to use stop-machine seems tragic, doubly so because
nobody should actually be having that memory mapped in the first place.




Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2

2020-11-19 Thread Peter Zijlstra
On Thu, Nov 19, 2020 at 01:25:11PM +, Chris Wilson wrote:
> Quoting Peter Zijlstra (2020-11-19 13:02:44)
> > 
> > Chris, I suspect this is due to i915 calling stop machine with all sorts
> > of locks held. Is there anything to be done about this? stop_machine()
> > is really nasty to begin with.
> > 
> > What problem is it typing to solve?
> 
> If there is any concurrent access through a PCI bar (that is exported to
> userspace via mmap) as the GTT is updated, results in undefined HW
> behaviour (where that is not limited to users writing to other system
> pages).
> 
> stop_machine() is the most foolproof method we know that works.

Sorry, I don't understand. It tries to do what? And why does it need to
do that holding locks.

Really, this is very bad form.

> This particular cycle is easy to break by moving the copy_to_user to
> after releasing perf_event_ctx_unlock in perf_read().

The splat in question is about the ioctl()s, but yeah that too. Not sure
how easy that is. I'm also not sure that'll solve your problem,
cpu_hotplug_lock is a big lock, there's tons of stuff inside.


Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2

2020-11-19 Thread Chris Wilson
Quoting Peter Zijlstra (2020-11-19 13:02:44)
> 
> Chris, I suspect this is due to i915 calling stop machine with all sorts
> of locks held. Is there anything to be done about this? stop_machine()
> is really nasty to begin with.
> 
> What problem is it typing to solve?

If there is any concurrent access through a PCI bar (that is exported to
userspace via mmap) as the GTT is updated, results in undefined HW
behaviour (where that is not limited to users writing to other system
pages).

stop_machine() is the most foolproof method we know that works.

This particular cycle is easy to break by moving the copy_to_user to
after releasing perf_event_ctx_unlock in perf_read().
-Chris


Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2

2020-11-19 Thread Peter Zijlstra


Chris, I suspect this is due to i915 calling stop machine with all sorts
of locks held. Is there anything to be done about this? stop_machine()
is really nasty to begin with.

What problem is it typing to solve?

On Thu, Nov 19, 2020 at 12:04:56AM +0100, Heiner Kallweit wrote:
> Just got the following when running perf.
> 
> [  648.247718] ==
> [  648.247725] WARNING: possible circular locking dependency detected
> [  648.247734] 5.10.0-rc4-next-20201118+ #1 Not tainted
> [  648.247740] --
> [  648.247748] perf/19761 is trying to acquire lock:
> [  648.247755] a00200abad18 (>mmap_lock#2){}-{3:3}, at: 
> __might_fault+0x2f/0x80
> [  648.24]
>but task is already holding lock:
> [  648.247785] a0027bc2edb0 (_mutex){+.+.}-{3:3}, at: 
> perf_event_ctx_lock_nested+0xd8/0x1f0
> [  648.247801]
>which lock already depends on the new lock.
> 
> [  648.247810]
>the existing dependency chain (in reverse order) is:
> [  648.247818]
>-> #5 (_mutex){+.+.}-{3:3}:
> [  648.247834]__mutex_lock+0x88/0x900
> [  648.247840]mutex_lock_nested+0x16/0x20
> [  648.247848]perf_event_init_cpu+0x89/0x140
> [  648.247857]perf_event_init+0x172/0x1a0
> [  648.247864]start_kernel+0x655/0x7de
> [  648.247871]x86_64_start_reservations+0x24/0x26
> [  648.247878]x86_64_start_kernel+0x70/0x74
> [  648.247887]secondary_startup_64_no_verify+0xb0/0xbb
> [  648.247894]
>-> #4 (pmus_lock){+.+.}-{3:3}:
> [  648.247907]__mutex_lock+0x88/0x900
> [  648.247914]mutex_lock_nested+0x16/0x20
> [  648.247921]perf_event_init_cpu+0x52/0x140
> [  648.247929]cpuhp_invoke_callback+0xa4/0x810
> [  648.247937]_cpu_up+0xaa/0x150
> [  648.247943]cpu_up+0x79/0x90
> [  648.247949]bringup_nonboot_cpus+0x4d/0x60
> [  648.247958]smp_init+0x25/0x65
> [  648.247964]kernel_init_freeable+0x144/0x267
> [  648.247972]kernel_init+0x9/0xf8
> [  648.247978]ret_from_fork+0x22/0x30
> [  648.247984]
>-> #3 (cpu_hotplug_lock){}-{0:0}:
> [  648.247998]cpus_read_lock+0x38/0xb0
> [  648.248006]stop_machine+0x18/0x40
> [  648.248075]bxt_vtd_ggtt_insert_entries__BKL+0x37/0x50 [i915]
> [  648.248129]ggtt_bind_vma+0x43/0x60 [i915]
> [  648.248192]__vma_bind+0x38/0x40 [i915]
> [  648.248242]fence_work+0x21/0xac [i915]
> [  648.248292]fence_notify+0x95/0x134 [i915]
> [  648.248342]__i915_sw_fence_complete+0x3b/0x1d0 [i915]
> [  648.248394]i915_sw_fence_commit+0x12/0x20 [i915]
> [  648.248458]i915_vma_pin_ww+0x25c/0x8c0 [i915]
> [  648.248520]i915_ggtt_pin+0x52/0xf0 [i915]
> [  648.248576]intel_ring_pin+0x5b/0x110 [i915]
> [  648.248628]__intel_context_do_pin_ww+0xd3/0x510 [i915]
> [  648.248681]__intel_context_do_pin+0x55/0x90 [i915]
> [  648.248734]intel_engines_init+0x43d/0x570 [i915]
> [  648.248787]intel_gt_init+0x119/0x2d0 [i915]
> [  648.248848]i915_gem_init+0x133/0x1c0 [i915]
> [  648.248895]i915_driver_probe+0x68d/0xc90 [i915]
> [  648.248943]i915_pci_probe+0x45/0x120 [i915]
> [  648.248952]pci_device_probe+0xd8/0x150
> [  648.248960]really_probe+0x259/0x460
> [  648.248967]driver_probe_device+0x50/0xb0
> [  648.248973]device_driver_attach+0xad/0xc0
> [  648.248980]__driver_attach+0x75/0x110
> [  648.248988]bus_for_each_dev+0x7c/0xc0
> [  648.248995]driver_attach+0x19/0x20
> [  648.249001]bus_add_driver+0x117/0x1c0
> [  648.249008]driver_register+0x8c/0xe0
> [  648.249015]__pci_register_driver+0x6e/0x80
> [  648.249022]0xc0a5c061
> [  648.249028]do_one_initcall+0x5a/0x2c0
> [  648.249036]do_init_module+0x5d/0x240
> [  648.249043]load_module+0x2367/0x2710
> [  648.249049]__do_sys_finit_module+0xb6/0xf0
> [  648.249056]__x64_sys_finit_module+0x15/0x20
> [  648.249064]do_syscall_64+0x38/0x50
> [  648.249071]entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  648.249078]
>-> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
> [  648.249093]__ww_mutex_lock.constprop.0+0xac/0x1090
> [  648.249100]ww_mutex_lock+0x3d/0xa0
> [  648.249108]dma_resv_lockdep+0x141/0x281
> [  648.249114]do_one_initcall+0x5a/0x2c0
> [  648.249121]kernel_init_freeable+0x220/0x267
> [  648.249129]kernel_init+0x9/0xf8
> [  648.249135]ret_from_fork+0x22/0x30
> [  648.249140]
>-> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
> [  648.249155]dma_resv_lockdep+0x115/0x281
> [  648.249162]do_one_initcall+0x5a/0x2c0
> [  648.249168]

Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2

2020-11-18 Thread Heiner Kallweit
Just got the following when running perf.

[  648.247718] ==
[  648.247725] WARNING: possible circular locking dependency detected
[  648.247734] 5.10.0-rc4-next-20201118+ #1 Not tainted
[  648.247740] --
[  648.247748] perf/19761 is trying to acquire lock:
[  648.247755] a00200abad18 (>mmap_lock#2){}-{3:3}, at: 
__might_fault+0x2f/0x80
[  648.24]
   but task is already holding lock:
[  648.247785] a0027bc2edb0 (_mutex){+.+.}-{3:3}, at: 
perf_event_ctx_lock_nested+0xd8/0x1f0
[  648.247801]
   which lock already depends on the new lock.

[  648.247810]
   the existing dependency chain (in reverse order) is:
[  648.247818]
   -> #5 (_mutex){+.+.}-{3:3}:
[  648.247834]__mutex_lock+0x88/0x900
[  648.247840]mutex_lock_nested+0x16/0x20
[  648.247848]perf_event_init_cpu+0x89/0x140
[  648.247857]perf_event_init+0x172/0x1a0
[  648.247864]start_kernel+0x655/0x7de
[  648.247871]x86_64_start_reservations+0x24/0x26
[  648.247878]x86_64_start_kernel+0x70/0x74
[  648.247887]secondary_startup_64_no_verify+0xb0/0xbb
[  648.247894]
   -> #4 (pmus_lock){+.+.}-{3:3}:
[  648.247907]__mutex_lock+0x88/0x900
[  648.247914]mutex_lock_nested+0x16/0x20
[  648.247921]perf_event_init_cpu+0x52/0x140
[  648.247929]cpuhp_invoke_callback+0xa4/0x810
[  648.247937]_cpu_up+0xaa/0x150
[  648.247943]cpu_up+0x79/0x90
[  648.247949]bringup_nonboot_cpus+0x4d/0x60
[  648.247958]smp_init+0x25/0x65
[  648.247964]kernel_init_freeable+0x144/0x267
[  648.247972]kernel_init+0x9/0xf8
[  648.247978]ret_from_fork+0x22/0x30
[  648.247984]
   -> #3 (cpu_hotplug_lock){}-{0:0}:
[  648.247998]cpus_read_lock+0x38/0xb0
[  648.248006]stop_machine+0x18/0x40
[  648.248075]bxt_vtd_ggtt_insert_entries__BKL+0x37/0x50 [i915]
[  648.248129]ggtt_bind_vma+0x43/0x60 [i915]
[  648.248192]__vma_bind+0x38/0x40 [i915]
[  648.248242]fence_work+0x21/0xac [i915]
[  648.248292]fence_notify+0x95/0x134 [i915]
[  648.248342]__i915_sw_fence_complete+0x3b/0x1d0 [i915]
[  648.248394]i915_sw_fence_commit+0x12/0x20 [i915]
[  648.248458]i915_vma_pin_ww+0x25c/0x8c0 [i915]
[  648.248520]i915_ggtt_pin+0x52/0xf0 [i915]
[  648.248576]intel_ring_pin+0x5b/0x110 [i915]
[  648.248628]__intel_context_do_pin_ww+0xd3/0x510 [i915]
[  648.248681]__intel_context_do_pin+0x55/0x90 [i915]
[  648.248734]intel_engines_init+0x43d/0x570 [i915]
[  648.248787]intel_gt_init+0x119/0x2d0 [i915]
[  648.248848]i915_gem_init+0x133/0x1c0 [i915]
[  648.248895]i915_driver_probe+0x68d/0xc90 [i915]
[  648.248943]i915_pci_probe+0x45/0x120 [i915]
[  648.248952]pci_device_probe+0xd8/0x150
[  648.248960]really_probe+0x259/0x460
[  648.248967]driver_probe_device+0x50/0xb0
[  648.248973]device_driver_attach+0xad/0xc0
[  648.248980]__driver_attach+0x75/0x110
[  648.248988]bus_for_each_dev+0x7c/0xc0
[  648.248995]driver_attach+0x19/0x20
[  648.249001]bus_add_driver+0x117/0x1c0
[  648.249008]driver_register+0x8c/0xe0
[  648.249015]__pci_register_driver+0x6e/0x80
[  648.249022]0xc0a5c061
[  648.249028]do_one_initcall+0x5a/0x2c0
[  648.249036]do_init_module+0x5d/0x240
[  648.249043]load_module+0x2367/0x2710
[  648.249049]__do_sys_finit_module+0xb6/0xf0
[  648.249056]__x64_sys_finit_module+0x15/0x20
[  648.249064]do_syscall_64+0x38/0x50
[  648.249071]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  648.249078]
   -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[  648.249093]__ww_mutex_lock.constprop.0+0xac/0x1090
[  648.249100]ww_mutex_lock+0x3d/0xa0
[  648.249108]dma_resv_lockdep+0x141/0x281
[  648.249114]do_one_initcall+0x5a/0x2c0
[  648.249121]kernel_init_freeable+0x220/0x267
[  648.249129]kernel_init+0x9/0xf8
[  648.249135]ret_from_fork+0x22/0x30
[  648.249140]
   -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[  648.249155]dma_resv_lockdep+0x115/0x281
[  648.249162]do_one_initcall+0x5a/0x2c0
[  648.249168]kernel_init_freeable+0x220/0x267
[  648.249176]kernel_init+0x9/0xf8
[  648.249182]ret_from_fork+0x22/0x30
[  648.249188]
   -> #0 (>mmap_lock#2){}-{3:3}:
[  648.249203]__lock_acquire+0x125d/0x2160
[  648.249210]lock_acquire+0x137/0x3e0
[  648.249217]__might_fault+0x59/0x80
[  648.249223]perf_copy_attr+0x35/0x340
[  648.249230]_perf_ioctl+0x3e1/0xd40
[  648.249237]perf_ioctl+0x34/0x60
[  648.249245]