wangchdo commented on PR #17573: URL: https://github.com/apache/nuttx/pull/17573#issuecomment-3701692113
> > > @anchao @wangchdo I have serious doubts about your professionalism. > > > > @wangchdo's HRTimer implementation has **fundamental functional correctness issues. It violates the ownership invariant of hrtimer**: that only one reader/writer can use the hrtimer object at a time. This violation makes it prone to issues in SMP scenarios, as I have consistently pointed out over and over again. Simple modifications like reference counting can not fix this implementation at all. > > > > > > > > > Why are you allowing such a functional incorrect code to be merged into the master branch. Isn’t this simply a case of factional bias clouding judgment? Please check this, thanks. [#17675 (comment)](https://github.com/apache/nuttx/pull/17675#issuecomment-3698712790) > > > > > > @Fix-Point This PR addresses the SMP corner cases you mentioned. Could you please take a look? I propose this solution for the following reasons: > > > > 1. It allows separate optimization paths for SMP and non-SMP, leaving room for extreme performance and strong determinism in both cases. (per @xiaoxiang781216 's comments I will submit a separate PR to put the SMP specific logics in CONFIG_SMP ) > > 2. It provides clean, easy-to-use and minimal user-facing APIs — only three core APIs, with no duplicated parameters: > > `hrtimer_init()` > > `hrtimer_start()` > > `hrtimer_cancel() / hrtimer_cancel_sync()` > > 3. It provides basement for queue abstraction if needed. > > > > @xiaoxiang781216 I resolved you comments, could you please take a look again @acassis @anchao please double check > > To behonest, you've only modified the code to prevent the test program from getting stuck there. You haven't solved the actual problem: **canceled periodic timer cannot overwrite the new timer**. > > I've emphasized this many times. Why make futile attempts again and again? Do you know how much of our time you've wasted? > > > More similar concurrency issues could be cited here. As I have emphasized again and again, **the fundamental problem is the violation of the ownership invariant of hrtimer: only one owner can modify the hrtimer object at a time**. > > **Designing functionally correct concurrent algorithms is not easy at all. Relying solely on engineering experience is insufficient; theoretical methods are necessary to avoid errors, such as adapting resource invariants and using structured diagrams to clarify every possible concurrent state transition**. @wangchdo's design failed to consider how to handle concurrency correctly, making it nearly impossible to improve upon his code base. > > > I appreciate your use of **reference counting** in the latest version to solve memory reclamation issues. However, the primary problem is that **reference counting loses version information**. A version‑aware method, such as **Epoch‑based memory reclamation** (a typical kernel implementation is Linux’s QSBR‑RCU), is needed to address this. > > As I mentioned in a previous scenario: > > > > * **t0**: Thread 0 on Core 0 starts a periodic timer with callback `CA`, initial delay `1000 ns`, period `UINT32_MAX`. > > * **t1**: The timer fires on Core 0, and callback `CA` is executing. > > * **t2**: Thread 0 is scheduled onto Core 1, it cancels the same timer, and restarts the timer with callback `CB`, initial delay `0 ns`, period `1 000 000 ns`. > > * **t3**: The timer fires on Core 1, and callback `CB` starts executing. > > * **t4**: Callback `CA` on Core 0 finishes, finds the `hrtimer` in a “running” state (but not of this version), and attempts to restart the timer by setting `hrtimer->expired += UINT32_MAX`. > > > > Here, two errors occur: > > > > * The old periodic timer that should have been canceled continues, overwriting the newly started timer. > > * After the old timer restarts, its next trigger time is `t2 + UINT32_MAX`, which is incorrect—it should have been `t0 + 1000 ns + UINT32_MAX`. > > > > **In my HRTimer implementation, this situation is impossible:** > > > > * **t0**: Thread 0 on Core 0 starts a periodic timer with callback `CA`, initial delay `1000 ns`, period `UINT32_MAX`. > > * **t1**: The timer fires on Core 0, and callback `CA` is executing. > > * **t2**: Thread 0 is scheduled onto Core 1, it cancels the same timer, **taking ownership** of the `hrtimer` data structure from Core 0, and restarts the timer with callback `CB`, initial delay `0 ns`, period `1 000 000 ns`. > > * **t3**: The timer fires on Core 1, and callback `CB` starts executing. > > * **t4**: Callback `CA` on Core 0 finishes, checks the hazard pointer and realizes it has lost ownership of the `hrtimer`, so it simply exits without further action. > > > > The key difference is that **reference counting loses version information, while hazard pointers do not**. Canceling a timer via hazard pointers ensures that, by time **t2**, all cores using the `hrtimer` have lost ownership of the hrtimer, thereby **guaranteeing that an old timer callback can never overwrite a new timer**. > > **Designing correct concurrent algorithms requires systematic consideration**. As I stated in #17570, after carefully reviewing your implementation, I could not really find a good solution that preserves version information without introducing significant performance or memory-overhead overhead. > > That is why I suggested we not spend additional time on the previous design and instead directly adopt this implementation. I do not disrespect your work or doubt your abilities—I acknowledge you as a diligent and excellent engineer. I simply do not wish to see anyone stumble twice on the same issue. I suggest reviewing your API design. The issues you mentioned stem from differences between your API design and mine, as I already pointed out in my comments after reviewing your implementation. In my view, your current API design is less intuitive and more error-prone, which makes such issues likely to occur. Also, I am not wasting anyone’s time—I am simply proposing solutions that I believe are better. By the way, I provides many comments in your implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
