On Wed, Apr 13, 2016 at 1:07 AM, Luis R. Rodriguez <mcg...@kernel.org> wrote: > On Mon, Apr 11, 2016 at 03:52:43PM +0200, Christian König wrote: >> Am 11.04.2016 um 15:39 schrieb Oded Gabbay: >> >On Mon, Apr 11, 2016 at 4:28 PM, Christian König >> ><christian.koe...@amd.com> wrote: >> >>Am 09.04.2016 um 02:25 schrieb Luis R. Rodriguez: >> >>>On Tue, Mar 29, 2016 at 10:41 AM, Luis R. Rodriguez <mcg...@kernel.org> >> >>>wrote: >> >>>>We need to ensure amd iommu v2 initializes before >> >>>>driver uses such as drivers/gpu/drm/amd/amdkfd/kfd_module.c, >> >>>>to do this make its init routine a subsys_initcall() which >> >>>>ensures its load init is called first than modules when >> >>>>built-in. >> >>>> >> >>>>This reverts the old work around implemented through commit >> >>>>1bacc894c227fad8a7 ("drivers: Move iommu/ before gpu/ in Makefile"), >> >>>>instead of making the dependency implicit by linker order this >> >>>>makes the ordering requirement explicit through proper kernel >> >>>>APIs. >> >>>> >> >>>>Cc: Oded Gabbay <oded.gab...@amd.com> >> >>>>Cc: Christian König <christian.koe...@amd.com> >> >>>>Signed-off-by: Luis R. Rodriguez <mcg...@kernel.org> >> >> >> >>Sorry for not responding earlier. Just coming back to all the stuff on my >> >>TODO list. >> >> >> >>Patch is Acked-by: Christian König <christian.koe...@amd.com> >> > >> >Christian, >> >Just wanted to be sure if you tested this patch-set or not. >> >> I did NOT tested it. If AMD IOMMU requires something which will now >> initialize after the IOMMU module we will obviously run into trouble >> again. >> >> I assumed that the creator of the patch did some testing. > > Nope, hence [RTF] Request For Testing. > >> >I don't think it should be merged without testing. If you already >> >tested it than fine. If not, I think I can do it in the next week or >> >so (just came back from PTO). >> >> Yeah, agree totally. > > Agreed, please let me know if someone is able to test and confirm > this works. It should work. > > Luis
Hi, So I finally got to test this patch and it's not working. The reason is that AMD IOMMUv2 gets initialized *before* AMD IOMMUv1 driver ! So, we get this from dmesg: [ 0.439612] AMD IOMMUv2 driver by Joerg Roedel <jroe...@suse.de> [ 0.439614] AMD IOMMUv2 functionality not available on this system later we get: [ 1.084749] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 [ 1.084750] AMD-Vi: Extended features: PPR GT IA PC [ 1.084754] AMD-Vi: Interrupt remapping enabled [ 1.085125] AMD-Vi: Lazy IO/TLB flushing enabled And when I run some HSA samples, at the tear-down process stage we get this: [ 7937.740306] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 7937.740932] IP: [<ffffffff819661c2>] mutex_lock+0x12/0x30 [ 7937.741534] PGD 55a57067 PUD 55a56067 PMD 0 [ 7937.742127] Oops: 0002 [#1] SMP [ 7937.742709] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec kvm_amd kvm snd_hda_core snd_hwdep snd_seq edac_mce_amd edac_core snd_seq_device irqbypass crct10dif_pclmul crc32_pclmul ppdev pcspkr crc32c_intel snd_pcm fuse ghash_clmulni_intel sp5100_tco acpi_cpufreq parport_pc i2c_piix4 snd_timer fam15h_power k10temp video tpm_infineon tpm_tis parport shpchp snd tpm soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc serio_raw uas usb_storage r8169 mii fjes [ 7937.746867] CPU: 2 PID: 2260 Comm: kfdtest Not tainted 4.6.0-rc3 #5 [ 7937.747600] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014 [ 7937.748363] task: ffff88006944d880 ti: ffff88005b734000 task.ti: ffff88005b734000 [ 7937.749136] RIP: 0010:[<ffffffff819661c2>] [<ffffffff819661c2>] mutex_lock+0x12/0x30 [ 7937.749918] RSP: 0018:ffff88005b737cc8 EFLAGS: 00010246 [ 7937.750695] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000 [ 7937.751491] RDX: 0000000080000000 RSI: ffff880000000000 RDI: 0000000000000020 [ 7937.752286] RBP: ffff88005b737cd0 R08: 0000000000000002 R09: ffffffff815321e2 [ 7937.753084] R10: ffffea00016f71c0 R11: 0000000000000246 R12: ffff880035239680 [ 7937.753884] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88005b737d10 [ 7937.754684] FS: 00007fe0bc68a740(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000 [ 7937.755504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7937.756328] CR2: 0000000000000020 CR3: 0000000057198000 CR4: 00000000000406e0 [ 7937.757168] Stack: [ 7937.758004] 0000000000000000 ffff88005b737d80 ffffffff810bae2c ffff88005b737d48 [ 7937.758872] 0000000000000020 ffff880000000000 0000000000000000 ffff88005b737d38 [ 7937.759737] ffff88005b737d38 ffff88005b737d10 ffff88005b737d10 ffff8800ffffffff [ 7937.760606] Call Trace: [ 7937.761474] [<ffffffff810bae2c>] flush_workqueue+0x9c/0x530 [ 7937.762348] [<ffffffff818107a4>] ? amd_iommu_domain_clear_gcr3+0x84/0xa0 [ 7937.763225] [<ffffffff818157d0>] mn_release+0x60/0x70 [ 7937.764107] [<ffffffff81210c64>] __mmu_notifier_release+0x44/0xc0 [ 7937.764998] [<ffffffff811efbca>] exit_mmap+0x15a/0x170 [ 7937.765889] [<ffffffff811e8b33>] ? handle_mm_fault+0x14e3/0x1d50 [ 7937.766784] [<ffffffff814b06d0>] ? n_tty_open+0xd0/0xd0 [ 7937.767677] [<ffffffff81124cac>] ? exit_robust_list+0x5c/0x110 [ 7937.768573] [<ffffffff810a199b>] mmput+0x5b/0x100 [ 7937.769463] [<ffffffff810a8566>] do_exit+0x276/0xb30 [ 7937.770349] [<ffffffff810684a5>] ? __do_page_fault+0x205/0x4d0 [ 7937.771226] [<ffffffff810a8ea7>] do_group_exit+0x47/0xb0 [ 7937.772090] [<ffffffff810a8f24>] SyS_exit_group+0x14/0x20 [ 7937.772941] [<ffffffff819684f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 [ 7937.773801] Code: 83 f8 01 0f 85 6d ff ff ff eb db e8 f9 e4 73 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 1e e4 ff ff <f0> ff 0b 79 08 48 89 df e8 c1 fe ff ff 65 48 8b 04 25 c0 d2 00 [ 7937.775671] RIP [<ffffffff819661c2>] mutex_lock+0x12/0x30 [ 7937.776588] RSP <ffff88005b737cc8> [ 7937.777499] CR2: 0000000000000020 [ 7937.781746] ---[ end trace 46aeb31f738c07ff ]--- [ 7937.781748] Fixing recursive fault but reboot is needed!