On Fri, 23 Jan 2026 15:27:05 +0000 "Hoyer, David" <[email protected]>
wrote:
> To: [email protected]
> From: [email protected]
> Subject: Unexpected pthread preemption
> Package: linux-source-6.12
> Version: 6.12.48-amd64
> OS: Debian/Trixie
>
> Overview:
> We are seeing with Debian Trixie kernel that pthreads are being preempted
> unexpectedly. In our work model for our application, we isolate a number of
> cores from the OS such that our application is the only thing running on
> these isolated cores. All of the pthreads are set up as SCHED_FIFO and
> running at the same priority such that it should be up to the pthread when it
> wants to allow preemption. We also run mlockall(MCL_CURRENT|MCL_FUTURE) to
> lock all of the memory in this application. Additionally, as SO modules are
> loaded we make sure all pages from these modules are pre-faulted in. The
> application is running with ulimit lock limit == infinity.
>
> Previously in Debian/buster, we had to add vm.compact_unevictable_allowed = 0
> since the default setting was causing unexpected eviction which led to
> similar behaviors. We have confirmed that this setting is still set to zero.
>
> In debugging this, we found that a pthread was transitioned out due to
> prev_state=D. In looking at what was happening at that point it was
> determined that it was a page fault due to the instruction it was trying to
> run. In this case the faulting instruction would have ran numerous times by
> this point so there was not reason for it to have to fault in this page.
>
> We have retested using bookworm kernel and are not seeing this issue.
>
> I performed an attempt at isolating this issue. I disabled
> CONFIG_TRANSPARENT_HUGEPAGE but still hit the issue. I then disabled
> CONFIG_COMPACTION and now have ran for nearly 72hrs without a failure
> (previously we would see failures in under 15hrs). Unfortunately shutting
> off COMPACTION is not something we want to do but it at least appears to
> prove that something in that realm changed which is causing this issue.
>
> Since the 6.1 kernel works for us and 6.12 is what fails, it will take some
> time to examine the changes in between to determine if a particular commit is
> causing this issue.
>
> David Hoyer
>
>
>
We started testing with 6.12.63-1 this week and it is showing promise that
something between 6.12.48-1 and 6.12.63-1 fixed this issue. It would be good
to know which commit fixed it though.