Control: found -1 6.17.8-1

Hi,

On Fri, Nov 28, 2025 at 11:50:48AM +0100, J. Neuschäfer wrote:
> On Wed, Nov 05, 2025 at 06:09:43AM +0100, Salvatore Bonaccorso wrote:
> > Control: tags -1 + moreinfo
> > 
> > Hi,
> > 
> > On Tue, Nov 04, 2025 at 05:05:27PM +0100, J. Neusch??fer wrote:
> > > Package: src:linux
> > > Version: 6.16.12-2
> > > Severity: normal
> > > X-Debbugs-Cc: [email protected]
> > > User: [email protected]
> > > Usertags: amd64
> > > 
> > > Suspend-to-RAM results in a hang that renders most of userspace
> > > unresponsive on my HP ProBook 445, based on a AMD Ryzen 5 5625U with
> > > Radeon Graphics. Fortunately I was still able to login on a different
> > > TTY and look at htop and dmesg, which showed that systemd-sleep was
> > > stuck in D state. The system was unable to fully enter suspend, or to
> > > resume everything normally. I ran into this issue multiple times with
> > > kernel version 6.16.12, as well as with 6.16.9.
> > > The NVMe storage (PCIe c0a9:0100) may or may not be related to this issue.
> > 
> > Can you please provide the full kernel log along so we have the full
> > context?
> 
> Sorry for the delay, it took me a while to test the different versions.

No problem at all, testing takes time (and us replying in most cases
as well :-))

> > Please try as well the recent uploaded 6.17.7-1 version uploaded to
> > unstable.
> 
> I've tested with 6.17.8+deb14-amd64, which has since been released, and
> the same issue persists. I'm attaching dmesg logs for 6.16 and 6.17, in
> which the bug is triggered.

Ack, I updated the metadata of the bug.

> > Can you additionally pin point a earlier version which works which
> > might give a range where you could start bisecting? (though I assume
> > it is not always reproducible).
> 
> I've been running 6.12.48+deb13-amd64 in the meantime, and it's been
> very stable, I haven't seen this issue.

Ok we have then likely a lower bound of 6.12 and an upper range bound
of 6.16 where the issue appears.

> I can reproduce the bug fairly reliably on 6.16/17 by running a specific
> podman container plus x2go (not entirely sure which parts of this is
> necessary).

Okay if you have a very reliable way to reproduce it, would you be
open to make "your hands bit dirty" and do some bisecting on the
issue?

The best thing would be to get a more closer range first via
installing Debian packages. I would suggest to fetch linux image
packages from https://snapshot.debian.org/ (rember, just keep them
using for testing, then switch away from them and uninstall not needed
ones again, in process of testing you might need to deinstall multiple
of those used for testing depending on how much space you have in
/boot).

For instance you can try first each first version uploaded from the
next major series, before the first 6.16.y upload to unstable we had
the following versions in experimental:

6.16.3-1
6.16.3-1~bpo13+1
6.16.1-1~exp1
6.16-1~exp1
6.16~rc7-1~exp1
6.15.6-1~exp1
6.15.5-1~exp1
6.15.4-1~exp1
6.15.3-1~exp1
6.15.2-1~exp1
6.15.1-1~exp1
6.15-1~exp1
6.15~rc7-1~exp1
6.14.6-1~exp1
6.14.5-1~exp1
6.14.3-1~exp1
6.13.11-1~exp1
6.13.10-1~exp1
6.13.9-1~exp1
6.13.8-1~exp1
6.13.7-1~exp1
6.13.6-1~exp1
6.13.5-1~exp1
6.13.4-1~exp1
6.13.3-1~exp1
6.13.2-1~exp1
6.13~rc7-1~exp1
6.13~rc6-1~exp1

Ideally we find the two major upstream versions which behave correctly
and the next major one which exposes the problem.

Now assume we have indication the problem appears between upstream
6.13 and 6.14. Then we can proceed as follows, and bisect the changes
between 6.13 and 6.14. The probecure is as follows:

    git clone 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
    cd linux-stable
    git checkout v6.13
    cp /boot/config-$(uname -r) .config
    yes '' | make localmodconfig
    make savedefconfig
    mv defconfig arch/x86/configs/my_defconfig

    # test 6.13 to ensure this is "good"
    make my_defconfig
    make -j $(nproc) bindeb-pkg
    ... install the resulting .deb package and confirm it successfully boots / 
problem does not exist

    # test 6.14 to ensure this is "bad"
    git checkout v6.14
    make my_defconfig
    make -j $(nproc) bindeb-pkg
    ... install the resulting .deb package and confirm it fails to boot / 
problem exists

With that confirmed, the bisection can start:

    git bisect start
    git bisect good v6.13
    git bisect bad v6.14

In each bisection step git checks out a state between the oldest
known-bad and the newest known-good commit. In each step test using:

    make my_defconfig
    make -j $(nproc) bindeb-pkg
    ... install, try to boot / verify if problem exists

and if the problem is hit run:

    git bisect bad

and if the problem doesn't trigger run:

    git bisect good

. Please pay attention to always select the just built kernel for
booting, it won't always be the default kernel picked up by grub.

Iterate until git announces to have identified the first bad commit.

Then provide the output of

    git bisect log

In the course of the bisection you might have to uninstall previous
kernels again to not exhaust the disk space in /boot. Also in the end
uninstall all self-built kernels again.

This should lead us in the ideal case to a breaking commit introducing
the problem.

Please let me know if you encounter any unclarity where it needs a
better outline of the steps.

If you are not too entusiastic to do these experiments, I right now
see it as only way to narrow it down to a kernel change if you are
able to consistently reproduce it with a given kernel version but not
with the 6.12.y series one (on the same system without other changes).

Regards,
Salvatore

Reply via email to