On Sat, 31 Jan 2026 14:40:06 +0000 "Hoyer, David" <[email protected]> 
wrote:
> On Fri, 23 Jan 2026 15:27:05 +0000 "Hoyer, David" <[email protected]> 
> wrote:
> > To: [email protected]
> > From: [email protected]
> > Subject: Unexpected pthread preemption
> > Package: linux-source-6.12
> > Version: 6.12.48-amd64
> > OS: Debian/Trixie
> >
> > Overview:
> > We are seeing with Debian Trixie kernel that pthreads are being preempted 
> > unexpectedly.   In our work model for our application, we isolate a number 
> > of cores from the OS such that our application is the only thing running on 
> > these isolated cores.   All of the pthreads are set up as SCHED_FIFO and 
> > running at the same priority such that it should be up to the pthread when 
> > it wants to allow preemption.   We also run 
> > mlockall(MCL_CURRENT|MCL_FUTURE) to lock all of the memory in this 
> > application.  Additionally, as SO modules are loaded we make sure all pages 
> > from these modules are pre-faulted in.   The application is running with 
> > ulimit lock limit == infinity.

> >
> > Previously in Debian/buster, we had to add vm.compact_unevictable_allowed = 
> > 0 since the default setting was causing unexpected eviction which led to 
> > similar behaviors.   We have confirmed that this setting is still set to 
> > zero.

> >
> > In debugging this, we found that a pthread was transitioned out due to 
> > prev_state=D.   In looking at what was happening at that point it was 
> > determined that it was a page fault due to the instruction it was trying to 
> > run.  In this case the faulting instruction would have ran numerous times 
> > by this point so there was not reason for it to have to fault in this page.

> >
> > We have retested using bookworm kernel and are not seeing this issue.
> >
> > I performed an attempt at isolating this issue.  I disabled 
> > CONFIG_TRANSPARENT_HUGEPAGE but still hit the issue.   I then disabled 
> > CONFIG_COMPACTION and now have ran for nearly 72hrs without a failure 
> > (previously we would see failures in under 15hrs).  Unfortunately shutting 
> > off COMPACTION is not something we want to do but it at least appears to 
> > prove that something in that realm changed which is causing this issue.

> >
> > Since the 6.1 kernel works for us and 6.12 is what fails, it will take some 
> > time to examine the changes in between to determine if a particular commit 
> > is causing this issue.

> >
> > David Hoyer
> >
> >
> >
> We started testing with 6.12.63-1 this week and it is showing promise that 
> something between 6.12.48-1 and 6.12.63-1 fixed this issue.  It would be good 
> to know which commit fixed it though.

>
>

Moving to 6.12.63-1 did fix this issue.  This bug report can be closed.

Reply via email to