Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2
On Thu, Nov 19 2020 at 13:25, Chris Wilson wrote: > Quoting Peter Zijlstra (2020-11-19 13:02:44) >> >> Chris, I suspect this is due to i915 calling stop machine with all sorts >> of locks held. Is there anything to be done about this? stop_machine() >> is really nasty to begin with. >> >> What problem is it typing to solve? > > If there is any concurrent access through a PCI bar (that is exported to > userspace via mmap) as the GTT is updated, results in undefined HW > behaviour (where that is not limited to users writing to other system > pages). > > stop_machine() is the most foolproof method we know that works. It's also the biggest hammer and is going to cause latencies just because even on CPUs which are not involved at all. We have already enough trouble vs. WBINVD latency wise, so no need to add yet another way to hurt everyone. As the gfx muck knows which processes have stuff mapped, there are certainly ways to make them and only them rendevouz and do so while staying preemptible otherwise. It might take an RESCHED_IPI to all CPUs to achieve that, but that's a cheap operation compared to what you want to do. Thanks, tglx
Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2
On Thu, Nov 19, 2020 at 03:19:14PM +0100, Peter Zijlstra wrote: > On Thu, Nov 19, 2020 at 01:25:11PM +, Chris Wilson wrote: > > Quoting Peter Zijlstra (2020-11-19 13:02:44) > > > > > > Chris, I suspect this is due to i915 calling stop machine with all sorts > > > of locks held. Is there anything to be done about this? stop_machine() > > > is really nasty to begin with. > > > > > > What problem is it typing to solve? > > > > If there is any concurrent access through a PCI bar (that is exported to > > userspace via mmap) as the GTT is updated, results in undefined HW > > behaviour (where that is not limited to users writing to other system > > pages). > > > > stop_machine() is the most foolproof method we know that works. > > Sorry, I don't understand. It tries to do what? And why does it need to > do that holding locks. > > Really, this is very bad form. Having poked around at the code; do I get it correct that this is using stop-machine to set IOMMU page-table entries, because the hardware cannot deal with two CPUs writing to the same device page-tables; which would be possible because that memory is exposed through PCI bars? Can't you simply exclude that memory from being visible through the PCI bar crud? Having to use stop-machine seems tragic, doubly so because nobody should actually be having that memory mapped in the first place.
Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2
On Thu, Nov 19, 2020 at 01:25:11PM +, Chris Wilson wrote: > Quoting Peter Zijlstra (2020-11-19 13:02:44) > > > > Chris, I suspect this is due to i915 calling stop machine with all sorts > > of locks held. Is there anything to be done about this? stop_machine() > > is really nasty to begin with. > > > > What problem is it typing to solve? > > If there is any concurrent access through a PCI bar (that is exported to > userspace via mmap) as the GTT is updated, results in undefined HW > behaviour (where that is not limited to users writing to other system > pages). > > stop_machine() is the most foolproof method we know that works. Sorry, I don't understand. It tries to do what? And why does it need to do that holding locks. Really, this is very bad form. > This particular cycle is easy to break by moving the copy_to_user to > after releasing perf_event_ctx_unlock in perf_read(). The splat in question is about the ioctl()s, but yeah that too. Not sure how easy that is. I'm also not sure that'll solve your problem, cpu_hotplug_lock is a big lock, there's tons of stuff inside.
Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2
Quoting Peter Zijlstra (2020-11-19 13:02:44) > > Chris, I suspect this is due to i915 calling stop machine with all sorts > of locks held. Is there anything to be done about this? stop_machine() > is really nasty to begin with. > > What problem is it typing to solve? If there is any concurrent access through a PCI bar (that is exported to userspace via mmap) as the GTT is updated, results in undefined HW behaviour (where that is not limited to users writing to other system pages). stop_machine() is the most foolproof method we know that works. This particular cycle is easy to break by moving the copy_to_user to after releasing perf_event_ctx_unlock in perf_read(). -Chris
Re: Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2
Chris, I suspect this is due to i915 calling stop machine with all sorts of locks held. Is there anything to be done about this? stop_machine() is really nasty to begin with. What problem is it typing to solve? On Thu, Nov 19, 2020 at 12:04:56AM +0100, Heiner Kallweit wrote: > Just got the following when running perf. > > [ 648.247718] == > [ 648.247725] WARNING: possible circular locking dependency detected > [ 648.247734] 5.10.0-rc4-next-20201118+ #1 Not tainted > [ 648.247740] -- > [ 648.247748] perf/19761 is trying to acquire lock: > [ 648.247755] a00200abad18 (>mmap_lock#2){}-{3:3}, at: > __might_fault+0x2f/0x80 > [ 648.24] >but task is already holding lock: > [ 648.247785] a0027bc2edb0 (_mutex){+.+.}-{3:3}, at: > perf_event_ctx_lock_nested+0xd8/0x1f0 > [ 648.247801] >which lock already depends on the new lock. > > [ 648.247810] >the existing dependency chain (in reverse order) is: > [ 648.247818] >-> #5 (_mutex){+.+.}-{3:3}: > [ 648.247834]__mutex_lock+0x88/0x900 > [ 648.247840]mutex_lock_nested+0x16/0x20 > [ 648.247848]perf_event_init_cpu+0x89/0x140 > [ 648.247857]perf_event_init+0x172/0x1a0 > [ 648.247864]start_kernel+0x655/0x7de > [ 648.247871]x86_64_start_reservations+0x24/0x26 > [ 648.247878]x86_64_start_kernel+0x70/0x74 > [ 648.247887]secondary_startup_64_no_verify+0xb0/0xbb > [ 648.247894] >-> #4 (pmus_lock){+.+.}-{3:3}: > [ 648.247907]__mutex_lock+0x88/0x900 > [ 648.247914]mutex_lock_nested+0x16/0x20 > [ 648.247921]perf_event_init_cpu+0x52/0x140 > [ 648.247929]cpuhp_invoke_callback+0xa4/0x810 > [ 648.247937]_cpu_up+0xaa/0x150 > [ 648.247943]cpu_up+0x79/0x90 > [ 648.247949]bringup_nonboot_cpus+0x4d/0x60 > [ 648.247958]smp_init+0x25/0x65 > [ 648.247964]kernel_init_freeable+0x144/0x267 > [ 648.247972]kernel_init+0x9/0xf8 > [ 648.247978]ret_from_fork+0x22/0x30 > [ 648.247984] >-> #3 (cpu_hotplug_lock){}-{0:0}: > [ 648.247998]cpus_read_lock+0x38/0xb0 > [ 648.248006]stop_machine+0x18/0x40 > [ 648.248075]bxt_vtd_ggtt_insert_entries__BKL+0x37/0x50 [i915] > [ 648.248129]ggtt_bind_vma+0x43/0x60 [i915] > [ 648.248192]__vma_bind+0x38/0x40 [i915] > [ 648.248242]fence_work+0x21/0xac [i915] > [ 648.248292]fence_notify+0x95/0x134 [i915] > [ 648.248342]__i915_sw_fence_complete+0x3b/0x1d0 [i915] > [ 648.248394]i915_sw_fence_commit+0x12/0x20 [i915] > [ 648.248458]i915_vma_pin_ww+0x25c/0x8c0 [i915] > [ 648.248520]i915_ggtt_pin+0x52/0xf0 [i915] > [ 648.248576]intel_ring_pin+0x5b/0x110 [i915] > [ 648.248628]__intel_context_do_pin_ww+0xd3/0x510 [i915] > [ 648.248681]__intel_context_do_pin+0x55/0x90 [i915] > [ 648.248734]intel_engines_init+0x43d/0x570 [i915] > [ 648.248787]intel_gt_init+0x119/0x2d0 [i915] > [ 648.248848]i915_gem_init+0x133/0x1c0 [i915] > [ 648.248895]i915_driver_probe+0x68d/0xc90 [i915] > [ 648.248943]i915_pci_probe+0x45/0x120 [i915] > [ 648.248952]pci_device_probe+0xd8/0x150 > [ 648.248960]really_probe+0x259/0x460 > [ 648.248967]driver_probe_device+0x50/0xb0 > [ 648.248973]device_driver_attach+0xad/0xc0 > [ 648.248980]__driver_attach+0x75/0x110 > [ 648.248988]bus_for_each_dev+0x7c/0xc0 > [ 648.248995]driver_attach+0x19/0x20 > [ 648.249001]bus_add_driver+0x117/0x1c0 > [ 648.249008]driver_register+0x8c/0xe0 > [ 648.249015]__pci_register_driver+0x6e/0x80 > [ 648.249022]0xc0a5c061 > [ 648.249028]do_one_initcall+0x5a/0x2c0 > [ 648.249036]do_init_module+0x5d/0x240 > [ 648.249043]load_module+0x2367/0x2710 > [ 648.249049]__do_sys_finit_module+0xb6/0xf0 > [ 648.249056]__x64_sys_finit_module+0x15/0x20 > [ 648.249064]do_syscall_64+0x38/0x50 > [ 648.249071]entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 648.249078] >-> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: > [ 648.249093]__ww_mutex_lock.constprop.0+0xac/0x1090 > [ 648.249100]ww_mutex_lock+0x3d/0xa0 > [ 648.249108]dma_resv_lockdep+0x141/0x281 > [ 648.249114]do_one_initcall+0x5a/0x2c0 > [ 648.249121]kernel_init_freeable+0x220/0x267 > [ 648.249129]kernel_init+0x9/0xf8 > [ 648.249135]ret_from_fork+0x22/0x30 > [ 648.249140] >-> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: > [ 648.249155]dma_resv_lockdep+0x115/0x281 > [ 648.249162]do_one_initcall+0x5a/0x2c0 > [ 648.249168]
Deadlock cpuctx_mutex / pmus_lock / >mmap_lock#2
Just got the following when running perf. [ 648.247718] == [ 648.247725] WARNING: possible circular locking dependency detected [ 648.247734] 5.10.0-rc4-next-20201118+ #1 Not tainted [ 648.247740] -- [ 648.247748] perf/19761 is trying to acquire lock: [ 648.247755] a00200abad18 (>mmap_lock#2){}-{3:3}, at: __might_fault+0x2f/0x80 [ 648.24] but task is already holding lock: [ 648.247785] a0027bc2edb0 (_mutex){+.+.}-{3:3}, at: perf_event_ctx_lock_nested+0xd8/0x1f0 [ 648.247801] which lock already depends on the new lock. [ 648.247810] the existing dependency chain (in reverse order) is: [ 648.247818] -> #5 (_mutex){+.+.}-{3:3}: [ 648.247834]__mutex_lock+0x88/0x900 [ 648.247840]mutex_lock_nested+0x16/0x20 [ 648.247848]perf_event_init_cpu+0x89/0x140 [ 648.247857]perf_event_init+0x172/0x1a0 [ 648.247864]start_kernel+0x655/0x7de [ 648.247871]x86_64_start_reservations+0x24/0x26 [ 648.247878]x86_64_start_kernel+0x70/0x74 [ 648.247887]secondary_startup_64_no_verify+0xb0/0xbb [ 648.247894] -> #4 (pmus_lock){+.+.}-{3:3}: [ 648.247907]__mutex_lock+0x88/0x900 [ 648.247914]mutex_lock_nested+0x16/0x20 [ 648.247921]perf_event_init_cpu+0x52/0x140 [ 648.247929]cpuhp_invoke_callback+0xa4/0x810 [ 648.247937]_cpu_up+0xaa/0x150 [ 648.247943]cpu_up+0x79/0x90 [ 648.247949]bringup_nonboot_cpus+0x4d/0x60 [ 648.247958]smp_init+0x25/0x65 [ 648.247964]kernel_init_freeable+0x144/0x267 [ 648.247972]kernel_init+0x9/0xf8 [ 648.247978]ret_from_fork+0x22/0x30 [ 648.247984] -> #3 (cpu_hotplug_lock){}-{0:0}: [ 648.247998]cpus_read_lock+0x38/0xb0 [ 648.248006]stop_machine+0x18/0x40 [ 648.248075]bxt_vtd_ggtt_insert_entries__BKL+0x37/0x50 [i915] [ 648.248129]ggtt_bind_vma+0x43/0x60 [i915] [ 648.248192]__vma_bind+0x38/0x40 [i915] [ 648.248242]fence_work+0x21/0xac [i915] [ 648.248292]fence_notify+0x95/0x134 [i915] [ 648.248342]__i915_sw_fence_complete+0x3b/0x1d0 [i915] [ 648.248394]i915_sw_fence_commit+0x12/0x20 [i915] [ 648.248458]i915_vma_pin_ww+0x25c/0x8c0 [i915] [ 648.248520]i915_ggtt_pin+0x52/0xf0 [i915] [ 648.248576]intel_ring_pin+0x5b/0x110 [i915] [ 648.248628]__intel_context_do_pin_ww+0xd3/0x510 [i915] [ 648.248681]__intel_context_do_pin+0x55/0x90 [i915] [ 648.248734]intel_engines_init+0x43d/0x570 [i915] [ 648.248787]intel_gt_init+0x119/0x2d0 [i915] [ 648.248848]i915_gem_init+0x133/0x1c0 [i915] [ 648.248895]i915_driver_probe+0x68d/0xc90 [i915] [ 648.248943]i915_pci_probe+0x45/0x120 [i915] [ 648.248952]pci_device_probe+0xd8/0x150 [ 648.248960]really_probe+0x259/0x460 [ 648.248967]driver_probe_device+0x50/0xb0 [ 648.248973]device_driver_attach+0xad/0xc0 [ 648.248980]__driver_attach+0x75/0x110 [ 648.248988]bus_for_each_dev+0x7c/0xc0 [ 648.248995]driver_attach+0x19/0x20 [ 648.249001]bus_add_driver+0x117/0x1c0 [ 648.249008]driver_register+0x8c/0xe0 [ 648.249015]__pci_register_driver+0x6e/0x80 [ 648.249022]0xc0a5c061 [ 648.249028]do_one_initcall+0x5a/0x2c0 [ 648.249036]do_init_module+0x5d/0x240 [ 648.249043]load_module+0x2367/0x2710 [ 648.249049]__do_sys_finit_module+0xb6/0xf0 [ 648.249056]__x64_sys_finit_module+0x15/0x20 [ 648.249064]do_syscall_64+0x38/0x50 [ 648.249071]entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 648.249078] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 648.249093]__ww_mutex_lock.constprop.0+0xac/0x1090 [ 648.249100]ww_mutex_lock+0x3d/0xa0 [ 648.249108]dma_resv_lockdep+0x141/0x281 [ 648.249114]do_one_initcall+0x5a/0x2c0 [ 648.249121]kernel_init_freeable+0x220/0x267 [ 648.249129]kernel_init+0x9/0xf8 [ 648.249135]ret_from_fork+0x22/0x30 [ 648.249140] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 648.249155]dma_resv_lockdep+0x115/0x281 [ 648.249162]do_one_initcall+0x5a/0x2c0 [ 648.249168]kernel_init_freeable+0x220/0x267 [ 648.249176]kernel_init+0x9/0xf8 [ 648.249182]ret_from_fork+0x22/0x30 [ 648.249188] -> #0 (>mmap_lock#2){}-{3:3}: [ 648.249203]__lock_acquire+0x125d/0x2160 [ 648.249210]lock_acquire+0x137/0x3e0 [ 648.249217]__might_fault+0x59/0x80 [ 648.249223]perf_copy_attr+0x35/0x340 [ 648.249230]_perf_ioctl+0x3e1/0xd40 [ 648.249237]perf_ioctl+0x34/0x60 [ 648.249245]