https://bugzilla.kernel.org/show_bug.cgi?id=221087

            Bug ID: 221087
           Summary: a1224f34d72a (Checking states of power resources
                    during initialization) causes NVMe #2 and Nvidia dGPU
                    loss on TigerLake-H (THUNDEROBOT ZERO)
           Product: ACPI
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Power-Other
          Assignee: [email protected]
          Reporter: [email protected]
        Regression: No

Created attachment 309356
  --> https://bugzilla.kernel.org/attachment.cgi?id=309356&action=edit
acpidump

This is a follow-up of this comment
(https://bugzilla.kernel.org/show_bug.cgi?id=214035#c18) I left three years
ago. Recently I've traced it down more and would like to add some more details.

I have a gaming laptop (model: THUNDEROBOT ZERO), which has two NVMe slots in
it. So basically I've encountered the same issue as bug #214035 describes - one
NVMe gets unexpectedly shut down. In my case -- after upgrading to linux v5.13,
when the 2nd NVMe is present in my laptop, it along with the NVIDIA GPU will be
both shut down together during boot, thus I would get "waiting for root device
xxxx" (if the system disk is NVMe #2) or "a start job is running for /dev/xxxx"
(if the system disk is at NVMe #1 and fstab is configured to mount #2
automatically).

Relevant dmesg (tested with 6.18.9 from the official Arch repository):

```
...
[    0.966711] nvidia 0000:01:00.0: Unable to change power state from D3cold to
D0, device inaccessible
[    0.966878] NVRM: The NVIDIA GPU 0000:01:00.0
               NVRM: (PCI ID: 10de:2520) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[    0.966929] nvidia 0000:01:00.0: probe with driver nvidia failed with error
-1
[    0.966970] NVRM: The NVIDIA probe routine failed for 1 device(s).
[    0.966971] NVRM: None of the NVIDIA devices were initialized.
...
[    1.712648] nvme nvme0: pci function 0000:02:00.0
[    1.712648] nvme nvme1: pci function 0000:03:00.0
[    1.712671] nvme 0000:02:00.0: Unable to change power state from D3cold to
D0, device inaccessible
[    1.749809] nvme nvme1: allocated 64 MiB host memory buffer (16 segments).
[    1.769968] nvme nvme1: 8/0/0 default/read/poll queues
...
```

According to bug #214035, this issue was fixed in v5.15.x. It's same for me:
NVMe#2 and NVIDIA are OK with v5.15-v5.15.12. However, the problem appears
again in v5.16-rc1 for me. More badly, this defect always stays until now. The
two devices will still get shut down on boot with the latest master commit. Not
sure if this is a long-standing regression bug; at least it is, for me.

Commit that introduced this bug is a1224f34d72a, shipped in v5.16-rc1. If I
switch to the latest master and revert this commit, it fixes the problem. One
workaround is to nullify `_OFF` functions with DSDT override. And before
realizing the DSDT override approach, I just directly bypassed all
`__acpi_power_off` as a workaround.

---

`lspci` outputs:

```
$ lspci -tv | head -n7
-[0000:00]-+-00.0  Intel Corporation Tiger Lake-H 8 cores Host Bridge/DRAM
Registers
           +-01.0-[01]--+-00.0  NVIDIA Corporation GA106M [GeForce RTX 3060
Mobile / Max-Q]
           |            \-00.1  NVIDIA Corporation GA106 High Definition Audio
Controller
           +-01.1-[02]----00.0  MAXIO Technology (Hangzhou) Ltd. NVMe SSD
Controller MAP1202 (DRAM-less)
           +-02.0  Intel Corporation TigerLake-H GT1 [UHD Graphics]
           +-04.0  Intel Corporation TigerLake-LP Dynamic Tuning Processor
Participant
           +-06.0-[03]----00.0  Kingston Technology Company, Inc. NV2 NVMe SSD
[SM2267XT] (DRAM-less)

$ cat /sys/bus/pci/devices/0000:00:01.0/firmware_node/path
\_SB_.PC00.PEG1
$ cat /sys/bus/pci/devices/0000:00:01.1/firmware_node/path
\_SB_.PC00.PEG2
$ cat /sys/bus/pci/devices/0000:00:06.0/firmware_node/path
\_SB_.PC00.PEG0
```

"MAXIO Technology" NVMe is plugged in NVMe slot #2. It does nothing with disk
brand, e.g. swapping MAXIO Technology and Kingston one will change nothing.
It's likely device NVIDIA and MAXIO Technology (the slot #2) are both
controlled by one power resource (`PXP`), so shutting down it always makes the
two devices both unavailable.

### More tests

I've patched `drivers/acpi/power.c` to print bus_id of devices about to be
turned off in `__acpi_power_off`, and also bus_id of devices that are marked as
"unused". These are test results for: (1) regression commit (7a63296d), (2)
parent of the regression commit (a1224f34), and (3) latest commit from master
with 7a63296d reverted.

**(1)**

```
[    0.539642] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [PG03]
[    0.539683] acpi device:19: Cannot transition to power state D3hot for
parent in D3cold
[    0.539686] acpi device:2a: Cannot transition to power state D3hot for
parent in D3cold
[    0.540031] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [V0PR]
[    0.540096] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [V1PR]
[    0.540158] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [V2PR]
[    0.540982] ACPI: PM: ACPI DEBUG: resource [TBT1] not used and would be
turned OFF!
[    0.540983] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [TBT1]
[    0.543017] ACPI: PM: ACPI DEBUG: resource [WRST] not used and would be
turned OFF!
[    0.543018] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [WRST]
[    0.543020] ACPI: PM: ACPI DEBUG: resource [BTRT] not used and would be
turned OFF!
[    0.543021] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [BTRT]
[    0.543023] ACPI: PM: ACPI DEBUG: resource [PXP] not used and would be
turned OFF!
[    0.543023] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [PXP]
```

**(2)**

```
[    0.549333] ACPI: PM: __acpi_power_off on Resource [PG03]
[    0.549374] acpi device:19: Cannot transition to power state D3hot for
parent in D3cold
[    0.549376] acpi device:2a: Cannot transition to power state D3hot for
parent in D3cold
[    0.549734] ACPI: PM: __acpi_power_off on Resource [V0PR]
[    0.549798] ACPI: PM: __acpi_power_off on Resource [V1PR]
[    0.549860] ACPI: PM: __acpi_power_off on Resource [V2PR]
```

No "unused". This means, PG03, V0PR, V1PR and V2PR are not shut down by
`acpi_turn_off_unused_power_resources`

**(3)**

```
[   20.268422] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [TBT0]
[   20.293752] ACPI: PM: ACPI DEBUG: __acpi_power_off called on resource [D3C]
```

No "unused" and only these two are getting shut down!

---

The NVMe+NVIDIA are controlled by `PXP`. I also attached `acpidump` results.
`PXP` is defined in ssdt12.

**System info**

```
~ ❯ uname -a                                                                   
                                           02:40:55
Linux bczhc-arch 6.18.9-bczhc2-dirty #104 SMP PREEMPT_DYNAMIC Wed Feb 11
23:58:04 CST 2026 x86_64 GNU/Linux
~ ❯ cat /etc/os-release | head -n1                                             
                                           02:41:30
NAME="Arch Linux"
```

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
acpi-bugzilla mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla

Reply via email to