On Fri, 23 Jan 2026 15:27:05 +0000 "Hoyer, David" <[email protected]> 
wrote: 
> To: [email protected] 
> From: [email protected] 
> Subject: Unexpected pthread preemption 
> Package: linux-source-6.12 
> Version: 6.12.48-amd64 
> OS: Debian/Trixie 
> 
> Overview: 
> We are seeing with Debian Trixie kernel that pthreads are being preempted 
> unexpectedly.   In our work model for our application, we isolate a number of 
> cores from the OS such that our application is the only thing running on 
> these isolated cores.   All of the pthreads are set up as SCHED_FIFO and 
> running at the same priority such that it should be up to the pthread when it 
> wants to allow preemption.   We also run mlockall(MCL_CURRENT|MCL_FUTURE) to 
> lock all of the memory in this application.  Additionally, as SO modules are 
> loaded we make sure all pages from these modules are pre-faulted in.   The 
> application is running with ulimit lock limit == infinity.
> 
> Previously in Debian/buster, we had to add vm.compact_unevictable_allowed = 0 
> since the default setting was causing unexpected eviction which led to 
> similar behaviors.   We have confirmed that this setting is still set to zero.
> 
> In debugging this, we found that a pthread was transitioned out due to 
> prev_state=D.   In looking at what was happening at that point it was 
> determined that it was a page fault due to the instruction it was trying to 
> run.  In this case the faulting instruction would have ran numerous times by 
> this point so there was not reason for it to have to fault in this page.
> 
> We have retested using bookworm kernel and are not seeing this issue. 
> 
> I performed an attempt at isolating this issue.  I disabled 
> CONFIG_TRANSPARENT_HUGEPAGE but still hit the issue.   I then disabled 
> CONFIG_COMPACTION and now have ran for nearly 72hrs without a failure 
> (previously we would see failures in under 15hrs).  Unfortunately shutting 
> off COMPACTION is not something we want to do but it at least appears to 
> prove that something in that realm changed which is causing this issue.
> 
> Since the 6.1 kernel works for us and 6.12 is what fails, it will take some 
> time to examine the changes in between to determine if a particular commit is 
> causing this issue.
> 
> David Hoyer 
> 
> 
> 
We started testing with 6.12.63-1 this week and it is showing promise that 
something between 6.12.48-1 and 6.12.63-1 fixed this issue.  It would be good 
to know which commit fixed it though.

Reply via email to