wangchdo commented on PR #17573: URL: https://github.com/apache/nuttx/pull/17573#issuecomment-3710512370
> > Thank you @wangchdo! This change builds fine now. @Fix-Point provides some important feedback as he seems to work on this already. I like approach of @wangchdo to keep things compatible and aligned with existing API. Maybe additional checks / protections may be implemented to avoid situations described by @Fix-Point? :-) > > > > > > @Fix-Point can you please provide exact test scenario steps to verify problems you mentioned on @wangchdo solution? This should confirm if the implementation requires some additional protections? :-) > > > > I have provided @wangchdo with 4-5 test cases and believe I have been very patient in explaining why this approach will not work. However, he still refuses to accept my reasoning. > > > > Here is the detailed explanation: > > > > First, to ensure concurrency correctness, we must adhere to resource invariants, such as the ownership invariant. That is, only one user can update a shared object at any given time. > > > > In the hrtimer design, there may be situations where a new hrtimer is **already being reset while the callback function of the old hrtimer is still executing**. For such scenarios, we must ensure in the hrtimer design that "a cancelled timer cannot modify the hrtimer object again." In other words, the execution core of the old timer callback loses ownership of the hrtimer object. Otherwise, it could lead to the old timer overwriting the new timer, which is an error. This is why I refer to his implementation as "functionally incorrect" (i.e., it violates the specification). > > > > **The key problem of its implementation is the violation of the ownership invariant, which is impossible to fix without introducing extra memory overhead**. Let me elaborate on why his latest design still violates the ownership invariant: > > After callback execution completes, a periodic timer requeues itself. In his latest implementation, the requeue condition is: > > ```c > > // executing callback... > > if (hrtimer->expired != UINT64_MAX && hrtimer->expired != expired) > > ``` > > It violates the ownership invariant when: > > 1. If the callback function for the current `hrtimer->expired = t1` is executing and returns 0. > > 2. While the callback is executing, the timer is cancelled and a new timer is started with `hrtimer->expired = t2`. > > 3. When the callback finishes, it finds `hrtimer->expired != t1` and mistakenly requeue the one-shot timer as a periodic timer, causing a system crash. > > > > 2. Even reverting to the original implementation still violates the invariant because **`expired` is not monotonically increasing and cannot be used as a version**. It is possible for the old and new timers to have the same `expired` time, allowing a cancelled timer to modify the new timer. > > > > ```c > > // executing callback... > > if (period > 0 && hrtimer->expired == expired) > > ``` > > > > It violates the ownership invariant when: > > 1. If a periodic timer `hrtimer->expired = t1`, `hrtimer->func = period_func` is executing. > > 2. While the callback is executing, the timer is cancleled and a new timer is started with `hrtimer->expired = t1`, `hrtimer->func = oneshot_func`. > > 3. When the cancelled `period_func` finishes, while `oneshot_func` is still executing, the cancelled `period_func` overwrites the newly set oneshot timer, causing an error. > > > > Even if we add a check for `func` as suggested by @wangchdo , problems remain: > > ```c > > // executing callback... > > if (period > 0 && hrtimer->expired == expired && hrtimer->func == func) > > ``` > > > > It violates the ownership invariant when: > > 1. If a periodic timer `hrtimer->expired = t1`, `hrtimer->func = period_func` is executing, and `period_func` returns the next delay `p1`. > > 2. At time `t1`, while the hrtimer callback is executing, the periodic timer is cancelled, and a new periodic timer is set to trigger immediately with `hrtimer->expired = t1`, `hrtimer->func = period_func`, which returns the next delay `p2`. > > 3. When the cancelled previous callback finishes, while the new callback is still executing, all version checks pass, and the next timer is set to `hrtimer->expired = t1 + p1`, overwriting the new timer (the expected next expired time should be `hrtimer->expired = t1 + p2`). > > > > The correct solution is to add a monotonically increasing version field, `version`, to the hrtimer. When an hrtimer is cancelled, `version` increments by 1, ensuring that **an cancelled old timer with old version cannot overwrite a newly set timer with newer version**. However, **this would inevitably increase the memory footprint of the `hrtimer`**. I considered this approach early in the design phase but ultimately opted for `hazard pointers` to avoid the memory overhead. > > > > BTW, I feel that @wangchdo has shown little respect for me: > > 1. I and @wangchdo each implemented our hrtimer independently, with almost no communication during the process. These implementations are completely different. In mid-December, @xiaoxiang781216 suggested that I share my design with @wangchdo . During our exchange, I pointed out his errors regarding SMP, but he refused to acknowledge them and accused me of stealing his ideas. He also asked me to submit my incomplete internal hrtimer implementation to the community. > > 2. Out of respect for his opinion, I submitted my design and demonstrated that it outperforms his in both functional correctness and performance. After I submitted my code implementation to the community, to my surprise, without any discussion, @anchao forcibly merged @wangchdo's functional incorrect implementation into the upstream, which made me feel very very uncomfortable. > > 4. Yet, @wangchdo consistently refused to replace his implementation with mine. I cannot understand why he remains so stubborn. > > 5. He rarely tests his own code; even pulling his PR and compiling it reveals compilation issues. He repeatedly pressured me to provide test cases for him but never offered any in return, which I find insulting. > > 6. I have consistently tried to reason with him, presenting performance test results, interface comparisons, and memory usage analyses. However, he refuses to acknowledge them and has not provided any convincing evidence to support his stance. > > > > I recommend that @wangchdo refer to my design and adopt hazard pointers, which are an optimal solution for memory efficiency. I have no issue with him using my design consideration (in fact, hazard pointers is memory reclamation technique proposed by others). Similar design goals inevitably lead to convergent designs, and this is entirely natural. > > > > Finally, Could you please help me review my implementation https://github.com/apache/nuttx/pull/17675? I welcome any feedback and believe that, at the very least, my implementation is superior to his in terms of functional correctness, design completeness, documentation, performance, scalability, and code reusability. =) > > > > Regarding the APIs, I have made every effort to align it with the original design. However, some aspects are inherently tied to the design and difficult to change. For example, my implementation encodes the state in `hrtimer->func`, which means restarting the timer requires passing the `func`. Besides, according to test results, this does not impact performance. @Fix-Point This is my last response to you: It is just about design choice when we decide how to handle concurrency issues. When you make a decision of whether the implementation is right, you only need to consider if the behavior is what we want. Anyway, I don't want to talk about anything else not related to technical stuff. And I don't want judging anyone personally happens in our community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
