Re: [PR] [EXPERIMENTAL]sched/hrtimer: Part 2: refine hrtimer state machine and introduce scheduler support with hrtimer [nuttx]

via GitHub Mon, 05 Jan 2026 05:54:07 -0800


wangchdo commented on PR #17573:
URL: https://github.com/apache/nuttx/pull/17573#issuecomment-3710512370


   > > Thank you @wangchdo! This change builds fine now. @Fix-Point provides 
some important feedback as he seems to work on this already. I like approach of 
@wangchdo to keep things compatible and aligned with existing API. Maybe 
additional checks / protections may be implemented to avoid situations 
described by @Fix-Point? :-)
   > 
   > > 
   > 
   > > @Fix-Point can you please provide exact test scenario steps to verify 
problems you mentioned on @wangchdo solution? This should confirm if the 
implementation requires some additional protections? :-)
   > 
   > 
   > 
   > I have provided @wangchdo  with 4-5 test cases and believe I have been 
very patient in explaining why this approach will not work. However, he still 
refuses to accept my reasoning.
   > 
   > 
   > 
   > Here is the detailed explanation:
   > 
   > 
   > 
   > First, to ensure concurrency correctness, we must adhere to resource 
invariants, such as the ownership invariant. That is, only one user can update 
a shared object at any given time.
   > 
   > 
   > 
   > In the hrtimer design, there may be situations where a new hrtimer is 
**already being reset while the callback function of the old hrtimer is still 
executing**. For such scenarios, we must ensure in the hrtimer design that "a 
cancelled timer cannot modify the hrtimer object again." In other words, the 
execution core of the old timer callback loses ownership of the hrtimer object. 
Otherwise, it could lead to the old timer overwriting the new timer, which is 
an error. This is why I refer to his implementation as "functionally incorrect" 
(i.e., it violates the specification).
   > 
   > 
   > 
   > **The key problem of its implementation is the violation of the ownership 
invariant, which is impossible to fix without introducing extra memory 
overhead**. Let me elaborate on why his latest design still violates the 
ownership invariant:  
   > 
   > After callback execution completes, a periodic timer requeues itself. In 
his latest implementation, the requeue condition is:
   > 
   > ```c
   > 
   > // executing callback...
   > 
   > if (hrtimer->expired != UINT64_MAX && hrtimer->expired != expired)
   > 
   > ```
   > 
   > It violates the ownership invariant when:
   > 
   > 1. If the callback function for the current `hrtimer->expired = t1` is 
executing and returns 0.
   > 
   > 2. While the callback is executing, the timer is cancelled and a new timer 
is started with `hrtimer->expired = t2`.
   > 
   > 3. When the callback finishes, it finds `hrtimer->expired != t1` and 
mistakenly requeue the one-shot timer as a periodic timer, causing a system 
crash.
   > 
   > 
   > 
   > 2. Even reverting to the original implementation still violates the 
invariant because **`expired` is not monotonically increasing and cannot be 
used as a version**. It is possible for the old and new timers to have the same 
`expired` time, allowing a cancelled timer to modify the new timer.
   > 
   > 
   > 
   > ```c
   > 
   > // executing callback...
   > 
   > if (period > 0 && hrtimer->expired == expired)
   > 
   > ```
   > 
   > 
   > 
   > It violates the ownership invariant when:
   > 
   > 1. If a periodic timer `hrtimer->expired = t1`, `hrtimer->func = 
period_func` is executing.
   > 
   > 2. While the callback is executing, the timer is cancleled and a new timer 
is started with `hrtimer->expired = t1`, `hrtimer->func = oneshot_func`.
   > 
   > 3. When the cancelled `period_func` finishes, while `oneshot_func` is 
still executing, the cancelled `period_func` overwrites the newly set oneshot 
timer, causing an error.
   > 
   > 
   > 
   > Even if we add a check for `func` as suggested by @wangchdo , problems 
remain:
   > 
   > ```c
   > 
   > // executing callback...
   > 
   > if (period > 0 && hrtimer->expired == expired && hrtimer->func == func)
   > 
   > ```
   > 
   > 
   > 
   > It violates the ownership invariant when:
   > 
   > 1. If a periodic timer `hrtimer->expired = t1`, `hrtimer->func = 
period_func` is executing, and `period_func` returns the next delay `p1`.
   > 
   > 2. At time `t1`, while the hrtimer callback is executing, the periodic 
timer is cancelled, and a new periodic timer is set to trigger immediately with 
`hrtimer->expired = t1`, `hrtimer->func = period_func`, which returns the next 
delay `p2`.
   > 
   > 3. When the cancelled previous callback finishes, while the new callback 
is still executing, all version checks pass, and the next timer is set to 
`hrtimer->expired = t1 + p1`, overwriting the new timer (the expected next 
expired time should be `hrtimer->expired = t1 + p2`).
   > 
   > 
   > 
   > The correct solution is to add a monotonically increasing version field, 
`version`, to the hrtimer. When an hrtimer is cancelled, `version` increments 
by 1, ensuring that **an cancelled old timer with old version cannot overwrite 
a newly set timer with newer version**. However, **this would inevitably 
increase the memory footprint of the `hrtimer`**. I considered this approach 
early in the design phase but ultimately opted for `hazard pointers` to avoid 
the memory overhead.
   > 
   > 
   > 
   > BTW, I feel that @wangchdo  has shown little respect for me:
   > 
   > 1.  I and @wangchdo each implemented our hrtimer independently, with 
almost no communication during the process. These implementations are 
completely different. In mid-December, @xiaoxiang781216  suggested that I share 
my design with @wangchdo . During our exchange, I pointed out his errors 
regarding SMP, but he refused to acknowledge them and accused me of stealing 
his ideas. He also asked me to submit my incomplete internal hrtimer 
implementation to the community.
   > 
   > 2.  Out of respect for his opinion, I submitted my design and demonstrated 
that it outperforms his in both functional correctness and performance. After I 
submitted my code implementation to the community, to my surprise, without any 
discussion, @anchao forcibly merged @wangchdo's functional incorrect 
implementation into the upstream, which made me feel very very uncomfortable. 
   > 
   > 4. Yet, @wangchdo  consistently refused to replace his implementation with 
mine. I cannot understand why he remains so stubborn.
   > 
   > 5. He rarely tests his own code; even pulling his PR and compiling it 
reveals compilation issues. He repeatedly pressured me to provide test cases 
for him but never offered any in return, which I find insulting.
   > 
   > 6. I have consistently tried to reason with him, presenting performance 
test results, interface comparisons, and memory usage analyses. However, he 
refuses to acknowledge them and has not provided any convincing evidence to 
support his stance.
   > 
   > 
   > 
   > I recommend that @wangchdo  refer to my design and adopt hazard pointers, 
which are an optimal solution for memory efficiency. I have no issue with him 
using my design consideration (in fact, hazard pointers is memory reclamation 
technique proposed by others). Similar design goals inevitably lead to 
convergent designs, and this is entirely natural.
   > 
   > 
   > 
   > Finally, Could you please help me review my implementation 
https://github.com/apache/nuttx/pull/17675? I welcome any feedback and believe 
that, at the very least, my implementation is superior to his in terms of 
functional correctness, design completeness, documentation, performance, 
scalability, and code reusability. =)
   > 
   > 
   > 
   > Regarding the APIs, I have made every effort to align it with the original 
design. However, some aspects are inherently tied to the design and difficult 
to change. For example, my implementation encodes the state in `hrtimer->func`, 
which means restarting the timer requires passing the `func`. Besides, 
according to test results, this does not impact performance.
   
   @Fix-Point  This is my last response to you:
   
   It is just about design choice 
   when we decide how to handle concurrency issues. When you make a decision of 
whether the implementation is right, you only need to consider if the behavior 
is what we want.
   
   Anyway, I don't want to talk about anything else not related to technical 
stuff. And I don't want judging anyone personally happens in our community.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [EXPERIMENTAL]sched/hrtimer: Part 2: refine hrtimer state machine and introduce scheduler support with hrtimer [nuttx]

Reply via email to