On Sat, 31 Jan 2026 14:40:06 +0000 "Hoyer, David" <[email protected]> wrote: > On Fri, 23 Jan 2026 15:27:05 +0000 "Hoyer, David" <[email protected]> > wrote: > > To: [email protected] > > From: [email protected] > > Subject: Unexpected pthread preemption > > Package: linux-source-6.12 > > Version: 6.12.48-amd64 > > OS: Debian/Trixie > > > > Overview: > > We are seeing with Debian Trixie kernel that pthreads are being preempted > > unexpectedly. In our work model for our application, we isolate a number > > of cores from the OS such that our application is the only thing running on > > these isolated cores. All of the pthreads are set up as SCHED_FIFO and > > running at the same priority such that it should be up to the pthread when > > it wants to allow preemption. We also run > > mlockall(MCL_CURRENT|MCL_FUTURE) to lock all of the memory in this > > application. Additionally, as SO modules are loaded we make sure all pages > > from these modules are pre-faulted in. The application is running with > > ulimit lock limit == infinity.
> > > > Previously in Debian/buster, we had to add vm.compact_unevictable_allowed = > > 0 since the default setting was causing unexpected eviction which led to > > similar behaviors. We have confirmed that this setting is still set to > > zero. > > > > In debugging this, we found that a pthread was transitioned out due to > > prev_state=D. In looking at what was happening at that point it was > > determined that it was a page fault due to the instruction it was trying to > > run. In this case the faulting instruction would have ran numerous times > > by this point so there was not reason for it to have to fault in this page. > > > > We have retested using bookworm kernel and are not seeing this issue. > > > > I performed an attempt at isolating this issue. I disabled > > CONFIG_TRANSPARENT_HUGEPAGE but still hit the issue. I then disabled > > CONFIG_COMPACTION and now have ran for nearly 72hrs without a failure > > (previously we would see failures in under 15hrs). Unfortunately shutting > > off COMPACTION is not something we want to do but it at least appears to > > prove that something in that realm changed which is causing this issue. > > > > Since the 6.1 kernel works for us and 6.12 is what fails, it will take some > > time to examine the changes in between to determine if a particular commit > > is causing this issue. > > > > David Hoyer > > > > > > > We started testing with 6.12.63-1 this week and it is showing promise that > something between 6.12.48-1 and 6.12.63-1 fixed this issue. It would be good > to know which commit fixed it though. > > Moving to 6.12.63-1 did fix this issue. This bug report can be closed.

