I agree that one possible likely explanation is the one previously offered 
by deep.b - deep cstates. Another could be needing to configure irqbalance 
to mask core 10 to avoid interrupts being scheduled on it.  

The kernel setting idle=poll won't work unless the BIOS settings match. For 
example, with HP hardware you need to use ILO to set *Minimum Processor 
Idle Power Core C-State* to No C-States. IBM, Dell have their own 
equivalent tools. HP have a BIOS level configuration, "Workload Profile" 
that is similar to what a tuned profile does in the kernel. Changing this 
profile from the default to the Low Latency profile (and then Custom) will 
bundle a bunch of useful stuff like disabling hyperthreading, VT-d, setting 
the CPU power regulator and energy bias to maximum performance, enabling 
Turbo mode, etc.

HPE document all of this in great detail in a whitepaper *Configuring and 
tuning HPE ProLiant Servers for low-latency applications*

It might be interesting to see the implementation of checkForWork() 

On Tuesday, February 17, 2026 at 12:22:23 AM UTC-5 Peter wrote:

> Average task execution time is less interesting that seeing raw latency 
> data - what does this app do? Is it listening to market data, customer 
> orders, doing rescheduled work? Are you using specialized 
> (SolarFlare/Mellanox NICs? 100 µs is a long time with Skylake and newer 
> hardware.
>
> On Friday, February 13, 2026 at 3:26:04 PM UTC-5 [email protected] wrote:
>
>> No. Actually, after some retests my observation is that it happens 
>> regardless the thread is pinned or not.
>> So, 
>>
>> When thread *T* is pinned to CPU #10 and the task interval is set to 
>> 1ms, the average task execution time is *100 µs*. However, when the task 
>> interval is increased to 40ms on the same pinned core, the average 
>> execution time significantly degrades to *250 µs*. If T is not pinned, 
>> the result is same.
>>
>>
>> piątek, 13 lutego 2026 o 18:39:03 UTC+1 Mark E. Dawson, Jr. napisał(a):
>>
>>> Do you have a baseline for how your isolated core should perform using a 
>>> tool like 'osnoise'?
>>>
>>> On Friday, February 13, 2026 at 10:18:49 AM UTC-6 [email protected] 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> let's look at the example:
>>>>
>>>> The system is running with the following kernel parameters: 
>>>>
>>>> isolcpus=10, nohz_full=10, nohz=on, idle=poll, intel_pstate=disable. 
>>>>
>>>> We have a thread *T* that uses Thread.onSpinWait() while polling a 
>>>> lock-free shared queue. In this context, the *task interval* refers to 
>>>> the time elapsed between adding consecutive tasks to the queue.
>>>>
>>>> When thread *T* is pinned to CPU #10 and the task interval is set to 
>>>> 1ms, the average task execution time is *100 µs*. However, when the 
>>>> task interval is increased to 40ms on the same pinned core, the average 
>>>> execution time significantly degrades to *250 µs*.
>>>>
>>>> In contrast, when thread *T* is unpinned, the performance remains much 
>>>> more consistent. At a 1ms task interval, the average execution time is 
>>>> *110 
>>>> µs*, and it only slightly increases to *120 µs* when the interval is 
>>>> extended to 40ms.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/e3d14cc6-335b-42c8-a0e8-ac08ace4b334n%40googlegroups.com.

Reply via email to