> On Apr 18, 2020, at 2:34 PM, Joerg Roedel <[email protected]> wrote:
> 
> On Sat, Apr 18, 2020 at 09:01:35AM -0400, Qian Cai wrote:
>> Hard to tell without testing further. I’ll leave that optimization in
>> the future, and focus on fixing those races first.
> 
> Yeah right, we should fix the existing races first before introducing
> new ones ;)
> 
> Btw, THANKS A LOT for tracking down all these race condition bugs, I am
> not even remotely able to trigger them with the hardware I have around.
> 
> I did some hacking and the attached diff shows how I think this race
> condition needs to be fixed. I boot-tested this fix on-top of v5.7-rc1,
> but did no further testing. Can you test it please?

No dice. There could be some other races. For example,

> @@ -1536,16 +1571,19 @@ static u64 *fetch_pte(struct protection_domain 
> *domain,
>                     unsigned long address,
>                     unsigned long *page_size)
...
>       amd_iommu_domain_get_pgtable(domain, &pgtable);
> 
>       if (address > PM_LEVEL_SIZE(pgtable.mode))
>               return NULL;
> 
>       level      =  pgtable.mode - 1;
>       pte        = &pgtable.root[PM_LEVEL_INDEX(level, address)];

<— increase_address_space()

>       *page_size =  PTE_LEVEL_PAGE_SIZE(level);
> 

        while (level > 0) {
                *page_size = PTE_LEVEL_PAGE_SIZE(level);

Then in iommu_unmap_page(),

        while (unmapped < page_size) {
                pte = fetch_pte(dom, bus_addr, &unmap_size);
                …
                bus_addr  = (bus_addr & ~(unmap_size - 1)) + unmap_size;

bus_addr would be unsync with dom->mode when it enter fetch_pte() again.
Could that be a problem?


[ 5159.274829][ T4057] LTP: starting oom02
[ 5160.382787][   C52] perf: interrupt took too long (7443 > 6208), lowering 
kernel.perf_event_max_sample_rate to 26800
[ 5167.951785][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc0000 flags=0x0010]
[ 5167.964540][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc1000 flags=0x0010]
[ 5167.977442][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc1900 flags=0x0010]
[ 5167.989901][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc1d00 flags=0x0010]
[ 5168.002291][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2000 flags=0x0010]
[ 5168.014665][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2400 flags=0x0010]
[ 5168.027132][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2800 flags=0x0010]
[ 5168.039566][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2c00 flags=0x0010]
[ 5168.051956][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc3000 flags=0x0010]
[ 5168.064310][  T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc3400 flags=0x0010]
[ 5168.076652][  T812] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xfffffffffffc3800 flags=0x0010]
[ 5168.088290][  T812] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xfffffffffffc3c00 flags=0x0010]
[ 5183.695829][    C8] smartpqi 0000:23:00.0: controller is offline: status 
code 0x14803
[ 5183.704390][    C8] smartpqi 0000:23:00.0: controller offline
[ 5183.756594][  C101] blk_update_request: I/O error, dev sda, sector 22306304 
op 0x1:(WRITE) flags 0x8000000 phys_seg 4 prio class 0
[ 5183.756628][   C34] sd 0:1:0:0: [sda] tag#655 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756759][   C56] blk_update_request: I/O error, dev sda, sector 58480128 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756810][   C79] sd 0:1:0:0: [sda] tag#234 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756816][  C121] sd 0:1:0:0: [sda] tag#104 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756837][   T53] sd 0:1:0:0: [sda] tag#4 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756882][  C121] sd 0:1:0:0: [sda] tag#104 CDB: opcode=0x2a 2a 00 00 4d 
d4 00 00 02 00 00
[ 5183.756892][   C79] sd 0:1:0:0: [sda] tag#234 CDB: opcode=0x2a 2a 00 02 03 
e4 00 00 02 00 00
[ 5183.756909][  C121] blk_update_request: I/O error, dev sda, sector 5100544 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756920][   C79] blk_update_request: I/O error, dev sda, sector 33809408 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756939][   T53] sd 0:1:0:0: [sda] tag#4 CDB: opcode=0x2a 2a 00 02 4b f8 
00 00 02 00 00
[ 5183.756967][   T53] blk_update_request: I/O error, dev sda, sector 38533120 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756989][   C49] blk_update_request: I/O error, dev sda, sector 30181376 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757045][   C51] sd 0:1:0:0: [sda] tag#452 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757107][   C51] sd 0:1:0:0: [sda] tag#452 CDB: opcode=0x2a 2a 00 02 95 
06 00 00 02 00 00
[ 5183.757125][   C51] blk_update_request: I/O error, dev sda, sector 43320832 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757199][   C82] blk_update_request: I/O error, dev sda, sector 10187776 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757209][  C109] blk_update_request: I/O error, dev sda, sector 29812736 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757215][   C77] sd 0:1:0:0: [sda] tag#849 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757222][  C110] sd 0:1:0:0: [sda] tag#558 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757237][   C92] blk_update_request: I/O error, dev sda, sector 6410240 
op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757244][   C91] sd 0:1:0:0: [sda] tag#73 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757251][   C68] sd 0:1:0:0: [sda] tag#416 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757458][   C77] sd 0:1:0:0: [sda] tag#849 CDB: opcode=0x2a 2a 00 02 78 
a4 00 00 02 00 00
[ 5183.757467][  C110] sd 0:1:0:0: [sda] tag#558 CDB: opcode=0x2a 2a 00 03 58 
94 00 00 02 00 00
[ 5183.757515][  C122] sd 0:1:0:0: [sda] tag#747 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757525][   C68] sd 0:1:0:0: [sda] tag#416 CDB: opcode=0x2a 2a 00 01 0e 
32 00 00 02 00 00
[ 5183.757536][   C91] sd 0:1:0:0: [sda] tag#73 CDB: opcode=0x2a 2a 00 01 a2 86 
00 00 02 00 00
[ 5183.757727][  C122] sd 0:1:0:0: [sda] tag#747 CDB: opcode=0x2a 2a 00 02 a7 
24 00 00 02 00 00
[ 5183.758530][   T53] Write-error on swap-device (254:1:64823296)
[ 5183.758758][   T53] Write-error on swap-device (254:1:35201024)
[ 5183.758811][  C105] Write-error on swap-device (254:1:52690944)
[ 5183.758959][   C82] Write-error on swap-device (254:1:6856704)



_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to