Re: [PR] [EXPERIMENTAL]sched/hrtimer: Part 2: refine hrtimer state machine and introduce scheduler support with hrtimer [nuttx]

via GitHub Mon, 05 Jan 2026 05:32:18 -0800


Fix-Point commented on PR #17573:
URL: https://github.com/apache/nuttx/pull/17573#issuecomment-3710441155


   > Thank you @wangchdo! This change builds fine now. @Fix-Point provides some 
important feedback as he seems to work on this already. I like approach of 
@wangchdo to keep things compatible and aligned with existing API. Maybe 
additional checks / protections may be implemented to avoid situations 
described by @Fix-Point? :-)
   > 
   > @Fix-Point can you please provide exact test scenario steps to verify 
problems you mentioned on @wangchdo solution? This should confirm if the 
implementation requires some additional protections? :-)
   
   I have provided @wangchdo  with 4-5 test cases and believe I have been very 
patient in explaining why this approach will not work. However, he still 
refuses to accept my reasoning.
   
   Here is the detailed explanation:
   
   First, to ensure concurrency correctness, we must adhere to resource 
invariants, such as the ownership invariant. That is, only one user can update 
a shared object at any given time.
   
   In the hrtimer design, there may be situations where a new hrtimer is 
**already being reset while the callback function of the old hrtimer is still 
executing**. For such scenarios, we must ensure in the hrtimer design that "a 
cancelled timer cannot modify the hrtimer object again." In other words, the 
execution core of the old timer callback loses ownership of the hrtimer object. 
Otherwise, it could lead to the old timer overwriting the new timer, which is 
an error. This is why I refer to his implementation as "functionally incorrect" 
(i.e., it violates the specification).
   
   **The key problem of its implementation is the violation of the ownership 
invariant, which is impossible to fix without introducing extra memory 
overhead**. Let me elaborate on why his latest design still violates the 
ownership invariant:  
   After callback execution completes, a periodic timer requeues itself. In his 
latest implementation, the requeue condition is:
   ```c
   // executing callback...
   if (hrtimer->expired != UINT64_MAX && hrtimer->expired != expired)
   ```
   It violates the ownership invariant when:
   1. If the callback function for the current `hrtimer->expired = t1` is 
executing and returns 0.
   2. While the callback is executing, the timer is cancelled and a new timer 
is started with `hrtimer->expired = t2`.
   3. When the callback finishes, it finds `hrtimer->expired != t1` and 
mistakenly requeue the one-shot timer as a periodic timer, causing a system 
crash.
   
   2. Even reverting to the original implementation still violates the 
invariant because **`expired` is not monotonically increasing and cannot be 
used as a version**. It is possible for the old and new timers to have the same 
`expired` time, allowing a cancelled timer to modify the new timer.
   
   ```c
   // executing callback...
   if (period > 0 && hrtimer->expired == expired)
   ```
   
   It violates the ownership invariant when:
   1. If a periodic timer `hrtimer->expired = t1`, `hrtimer->func = 
period_func` is executing.
   2. While the callback is executing, the timer is cancleled and a new timer 
is started with `hrtimer->expired = t1`, `hrtimer->func = oneshot_func`.
   3. When the cancelled `period_func` finishes, while `oneshot_func` is still 
executing, the cancelled `period_func` overwrites the newly set oneshot timer, 
causing an error.
   
   Even if we add a check for `func` as suggested by @wangchdo , problems 
remain:
   ```c
   // executing callback...
   if (period > 0 && hrtimer->expired == expired && hrtimer->func == func)
   ```
   
   It violates the ownership invariant when:
   1. If a periodic timer `hrtimer->expired = t1`, `hrtimer->func = 
period_func` is executing, and `period_func` returns the next delay `p1`.
   2. At time `t1`, while the hrtimer callback is executing, the periodic timer 
is cancelled, and a new periodic timer is set to trigger immediately with 
`hrtimer->expired = t1`, `hrtimer->func = period_func`, which returns the next 
delay `p2`.
   3. When the cancelled previous callback finishes, while the new callback is 
still executing, all version checks pass, and the next timer is set to 
`hrtimer->expired = t1 + p1`, overwriting the new timer (the expected next 
expired time should be `hrtimer->expired = t1 + p2`).
   
   The correct solution is to add a monotonically increasing version field, 
`version`, to the hrtimer. When an hrtimer is cancelled, `version` increments 
by 1, ensuring that **an cancelled old timer with old version cannot overwrite 
a newly set timer with newer version**. However, **this would inevitably 
increase the memory footprint of the `hrtimer`**. I considered this approach 
early in the design phase but ultimately opted for `hazard pointers` to avoid 
the memory overhead.
   
   BTW, I feel that @wangchdo  has shown little respect for me:
   1.  I and @wangchdo each implemented our hrtimer independently, with almost 
no communication during the process. These implementations are completely 
different. In mid-December, @xiaoxiang781216  suggested that I share my design 
with @wangchdo . During our exchange, I pointed out his errors regarding SMP, 
but he refused to acknowledge them and accused me of stealing his ideas. He 
also asked me to submit my incomplete internal hrtimer implementation to the 
community. Out of respect for his opinion, I submitted my design and 
demonstrated that it outperforms his in both functional correctness and 
performance. Yet, he consistently refused to replace his implementation with 
mine. I cannot understand why he remains so stubborn.
   2. He rarely tests his own code; even pulling his PR and compiling it 
reveals compilation issues. He repeatedly pressured me to provide test cases 
for him but never offered any in return, which I find insulting.
   3. I have consistently tried to reason with him, presenting performance test 
results, interface comparisons, and memory usage analyses. However, he refuses 
to acknowledge them and has not provided any convincing evidence to support his 
stance.
   
   I recommend that @wangchdo  refer to my design and adopt hazard pointers, 
which are an optimal solution for memory efficiency. I have no issue with him 
using my design consideration (in fact, hazard pointers is memory reclamation 
technique proposed by others). Similar design goals inevitably lead to 
convergent designs, and this is entirely natural.
   
   Finally, Could you please help me review my implementation? I welcome any 
feedback and believe that, at the very least, my implementation is superior to 
his in terms of functional correctness, design completeness, documentation, 
performance, scalability, and code reusability.
   
   Regarding the APIs, I have made every effort to align it with the original 
design. However, some aspects are inherently tied to the design and difficult 
to change. For example, my implementation encodes the state in `hrtimer->func`, 
which means restarting the timer requires passing the `func`. Besides, 
according to test results, this does not impact performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [EXPERIMENTAL]sched/hrtimer: Part 2: refine hrtimer state machine and introduce scheduler support with hrtimer [nuttx]

Reply via email to