Fix-Point commented on PR #17570:
URL: https://github.com/apache/nuttx/pull/17570#issuecomment-3677563324

   > > > > Please pay more respect to parallel programming. It does not look 
like a simple change can fix this at all.
   > > > 
   > > > 
   > > > **why this change is not simple? why fixing issue is not respecting 
you?**
   > > > > I hope that before you claim you fix the issue, you can review your 
state-machine more carefully. I must say sorry that I have not too much time to 
review your commit. I am not your z3 SMT solver who can always automatically 
generate the counter-examples for you. I think it is impossible (or very hard) 
to allow restart the timer correctly unless refactoring your state-machine 
design. At least, I can not find a solution that can ensure functional 
correctness for your design. That is why I recommend refactoring your 
implementation with #17556.
   > > > > For example, When the newly-set timer (triggered at t2) while the 
old timer callback function (triggered at t1) is executing simultaneously. This 
is a highly likely SMP scenario of thread interleaving. The old timer callback 
returns next as `UINT32_MAX`, while the new request callback returns next as 
`1000`. However, since the old timer callback detects that the hrtimer is in 
the `running` state, so it sets the next timer to `(t2 + UINT32_MAX)` instead 
of `(t1 + UINT32_MAX)`. I suspect that the reason your patch fails to run 
`ostest` on `rv-virt:smp` is also due to this issue.
   > > > 
   > > > 
   > > > **I already showed that ostest passed on rv-virt:smp64**
   > > > > ```shell
   > > > > Timeline:
   > > > > 
   > > > > CPU1 (Timer Callback)                     CPU2 (Set New Timer)
   > > > > 
   > > > > 
------|--------------------------------------|-------------------------
   > > > > 
   > > > >       |                                      |
   > > > > 
   > > > >       t1: Old timer triggers at t1           |
   > > > > 
   > > > >       |--- Callback starts                   |
   > > > > 
   > > > >       |   hrtimer->state <- running          | [Lock]
   > > > > 
   > > > >       | [Unlock]                             t2: New timer being 
start(t2)
   > > > > 
   > > > >       |                                      |--- hrtimer_start()
   > > > > 
   > > > >       |                                      |    hrtimer->state <- 
armed
   > > > > 
   > > > >       |   ...                                | [Unlock]
   > > > > 
   > > > >       |                                      | ...
   > > > > 
   > > > >       |   Callback executes...               | [Lock]
   > > > > 
   > > > >       |                                      |--- New timer triggered
   > > > > 
   > > > >       |                                      |    hrtimer->state <- 
running
   > > > > 
   > > > >       |                                      | [Unlock]
   > > > > 
   > > > >       |                                      |    Calllback 
executes....
   > > > > 
   > > > >       |                                      |
   > > > > 
   > > > >       |   Returns next = UINT32_MAX          |
   > > > > 
   > > > >       | [Lock]                               |
   > > > > 
   > > > >       | if (hrtimer->state == running)       |
   > > > > 
   > > > >       |   next expired                       | 
   > > > > 
   > > > >       |    = t2 + UINT32_MAX (wrong!)        | 
   > > > > 
   > > > >       |   hrtimer->state <- armed            | 
   > > > > 
   > > > >       | [Unlock]                             |
   > > > > 
   > > > >       |                                      |  [Lock]
   > > > > 
   > > > >       |                                      |--- hrtimer->state != 
running
   > > > > 
   > > > >       |                                      |    failed to set next 
(wrong!)
   > > > > 
   > > > >       |                                      |
   > > > > 
   > > > > 
------|--------------------------------------|-------------------------
   > > > > ```
   > > > 
   > > > 
   > > > **I don’t think the case you pointed out is an issue, as allowing an 
hrtimer that has just been re-armed to be restarted again is purely a design 
choice. My initial design allowed re-arming an already armed hrtimer. After 
considering @xiaoxiang781216’s comments, I have revised the implementation to 
treat this case as an error.**
   > > > > By the way, here is the another simple test-case that fails that I 
do not even know why. It stucked at `hrtimer_cancel_sync(&timer);`. Please fix 
this problem first.
   > > > > I must say that I am not a lab rat or your tester. Please ensure the 
functional correctness of your patches before merging it to upstream. I fully 
agree with @anchao, who pointed out my problem in #16342 before.
   > > > > > emm ... community is not a testing ground, please do not treating 
everyone like lab rats.
   > > > > 
   > > > > 
   > > > > After that, I am trying my best to ensure my PR is functional 
correctness. I hope you too.
   > > > > ```c
   > > > > 
/****************************************************************************
   > > > > 
   > > > >  * apps/examples/hello/hello_main.c
   > > > > 
   > > > >  *
   > > > > 
   > > > >  * SPDX-License-Identifier: Apache-2.0
   > > > > 
   > > > >  *
   > > > > 
   > > > >  * Licensed to the Apache Software Foundation (ASF) under one or more
   > > > > 
   > > > >  * contributor license agreements.  See the NOTICE file distributed 
with
   > > > > 
   > > > >  * this work for additional information regarding copyright 
ownership.  The
   > > > > 
   > > > >  * ASF licenses this file to you under the Apache License, Version 
2.0 (the
   > > > > 
   > > > >  * "License"); you may not use this file except in compliance with 
the
   > > > > 
   > > > >  * License.  You may obtain a copy of the License at
   > > > > 
   > > > >  *
   > > > > 
   > > > >  *   http://www.apache.org/licenses/LICENSE-2.0
   > > > > 
   > > > >  *
   > > > > 
   > > > >  * Unless required by applicable law or agreed to in writing, 
software
   > > > > 
   > > > >  * distributed under the License is distributed on an "AS IS" BASIS, 
WITHOUT
   > > > > 
   > > > >  * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
See the
   > > > > 
   > > > >  * License for the specific language governing permissions and 
limitations
   > > > > 
   > > > >  * under the License.
   > > > > 
   > > > >  *
   > > > > 
   > > > >  
****************************************************************************/
   > > > > 
   > > > > 
   > > > > 
   > > > > 
/****************************************************************************
   > > > > 
   > > > >  * Included Files
   > > > > 
   > > > >  
****************************************************************************/
   > > > > 
   > > > > 
   > > > > 
   > > > > #include <nuttx/config.h>
   > > > > 
   > > > > #include <stdio.h>
   > > > > 
   > > > > #include <unistd.h>
   > > > > 
   > > > > 
   > > > > 
   > > > > #include <nuttx/hrtimer.h>
   > > > > 
   > > > > 
   > > > > 
   > > > > #define HRTIMER_TEST_THREAD_NR (1)
   > > > > 
   > > > > #define HRTIMER_TEST_NR        (1000000)
   > > > > 
   > > > > 
   > > > > 
   > > > > 
/****************************************************************************
   > > > > 
   > > > >  * Public Functions
   > > > > 
   > > > >  
****************************************************************************/
   > > > > 
   > > > > 
   > > > > 
   > > > > static int volatile tcount = 0;
   > > > > 
   > > > > static volatile uint32_t next = 0;
   > > > > 
   > > > > static uint32_t test_callback(FAR struct hrtimer_s *hrtimer)
   > > > > 
   > > > > {
   > > > > 
   > > > >   tcount++;
   > > > > 
   > > > >   up_ndelay(hrtimer->expired % (10 * NSEC_PER_USEC));
   > > > > 
   > > > >   return 0;
   > > > > 
   > > > > }
   > > > > 
   > > > > 
   > > > > 
   > > > > static uint32_t test_callback_background(FAR struct hrtimer_s 
*hrtimer)
   > > > > 
   > > > > {
   > > > > 
   > > > >   up_ndelay(hrtimer->expired % NSEC_PER_USEC);
   > > > > 
   > > > >   return 0;
   > > > > 
   > > > > }
   > > > > 
   > > > > 
   > > > > 
   > > > > 
   > > > > 
   > > > > static void test1(int tid)
   > > > > 
   > > > > {
   > > > > 
   > > > >   hrtimer_t   timer;
   > > > > 
   > > > >   int         count = 0;
   > > > > 
   > > > >   irqstate_t flags;
   > > > > 
   > > > >   spinlock_t lock;
   > > > > 
   > > > > 
   > > > > 
   > > > >   if (tid == 0)
   > > > > 
   > > > >     {
   > > > > 
   > > > >       hrtimer_init(&timer, test_callback, NULL);
   > > > > 
   > > > >     }
   > > > > 
   > > > >   else
   > > > > 
   > > > >     {
   > > > > 
   > > > >       hrtimer_init(&timer, test_callback_background, NULL);
   > > > > 
   > > > >     }
   > > > > 
   > > > > 
   > > > > 
   > > > >   while (count++ < HRTIMER_TEST_NR)
   > > > > 
   > > > >     {
   > > > > 
   > > > >       int ret;
   > > > > 
   > > > >       if (tid == 0)
   > > > > 
   > > > >         {
   > > > > 
   > > > >           uint64_t delay = rand() % (10 * NSEC_PER_MSEC);
   > > > > 
   > > > > 
   > > > > 
   > > > >           /* Simulate the periodical hrtimer.. */
   > > > > 
   > > > > 
   > > > > 
   > > > >           flags = spin_lock_irqsave(&lock);
   > > > > 
   > > > > 
   > > > > 
   > > > >           /* Use as periodical timer */
   > > > > 
   > > > > 
   > > > > 
   > > > >           ret = hrtimer_cancel(&timer);
   > > > > 
   > > > >           ret = hrtimer_start(&timer, 1000, HRTIMER_MODE_REL);
   > > > > 
   > > > > 
   > > > > 
   > > > >           spin_unlock_irqrestore(&lock, flags);
   > > > > 
   > > > > 
   > > > > 
   > > > >           up_ndelay(NSEC_PER_MSEC);
   > > > > 
   > > > > 
   > > > > 
   > > > >           flags = spin_lock_irqsave(&lock);
   > > > > 
   > > > > 
   > > > > 
   > > > >           ret = hrtimer_cancel(&timer);
   > > > > 
   > > > >           ret = hrtimer_start(&timer, 1000, HRTIMER_MODE_REL);
   > > > > 
   > > > >           spin_unlock_irqrestore(&lock, flags);
   > > > > 
   > > > > 
   > > > > 
   > > > >           up_ndelay(NSEC_PER_MSEC);
   > > > > 
   > > > > 
   > > > > 
   > > > >           hrtimer_cancel_sync(&timer); // stucked here????
   > > > > 
   > > > >           printf("???\n");
   > > > > 
   > > > >         }
   > > > > 
   > > > >       else
   > > > > 
   > > > >         {
   > > > > 
   > > > >           /* Simulate the background hrtimer.. */
   > > > > 
   > > > > 
   > > > > 
   > > > >           uint64_t delay = rand() % (10 * NSEC_PER_MSEC);
   > > > > 
   > > > > 
   > > > > 
   > > > >           ret = hrtimer_cancel(&timer);
   > > > > 
   > > > >           ret = hrtimer_start(&timer, delay, HRTIMER_MODE_REL);
   > > > > 
   > > > >         }
   > > > > 
   > > > > 
   > > > > 
   > > > >       UNUSED(ret);
   > > > > 
   > > > >     }
   > > > > 
   > > > > }
   > > > > 
   > > > > 
   > > > > 
   > > > > static void* test_thread(void *arg)
   > > > > 
   > > > > {
   > > > > 
   > > > >   while (1)
   > > > > 
   > > > >     {
   > > > > 
   > > > >       test1((int)arg);
   > > > > 
   > > > >     }
   > > > > 
   > > > >   return NULL;
   > > > > 
   > > > > }
   > > > > 
   > > > > 
/****************************************************************************
   > > > > 
   > > > >  * hello_main
   > > > > 
   > > > >  
****************************************************************************/
   > > > > 
   > > > > 
   > > > > 
   > > > > int main(int argc, FAR char *argv[])
   > > > > 
   > > > > {
   > > > > 
   > > > >   unsigned int   thread_id;
   > > > > 
   > > > >   pthread_attr_t attr;
   > > > > 
   > > > >   pthread_t      pthreads[HRTIMER_TEST_THREAD_NR];
   > > > > 
   > > > > 
   > > > > 
   > > > >   printf("hrtimer_test start...\n");
   > > > > 
   > > > > 
   > > > > 
   > > > >   ASSERT(pthread_attr_init(&attr) == 0);
   > > > > 
   > > > > 
   > > > > 
   > > > >   /* Create wdog test thread */
   > > > > 
   > > > > 
   > > > > 
   > > > >   for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; 
thread_id++)
   > > > > 
   > > > >     {
   > > > > 
   > > > >       ASSERT(pthread_create(&pthreads[thread_id], &attr,
   > > > > 
   > > > >                             test_thread, (void *)thread_id) == 0);
   > > > > 
   > > > >     }
   > > > > 
   > > > > 
   > > > > 
   > > > >   for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; 
thread_id++)
   > > > > 
   > > > >     {
   > > > > 
   > > > >       pthread_join(pthreads[thread_id], NULL);
   > > > > 
   > > > >     }
   > > > > 
   > > > > 
   > > > > 
   > > > >   ASSERT(pthread_attr_destroy(&attr) == 0);
   > > > > 
   > > > > 
   > > > > 
   > > > >   printf("hrtimer_test end...\n");
   > > > > 
   > > > >   return 0;
   > > > > 
   > > > > }
   > > > > ```
   > > > 
   > > > 
   > > > **this is a minor issue, i just forgot to set hritmer to inactive when 
period is 0 according to the state machine I designe, it is fixed now:**
   > > > ```
   > > > 
   > > > case HRTIMER_STATE_RUNNING:
   > > >             {
   > > >               /* Timer callback completed normally */
   > > > 
   > > >               if (period > 0)
   > > >                 {
   > > >                   hrtimer->expired += period;
   > > >                   hrtimer->state = HRTIMER_STATE_ARMED;
   > > >                   RB_INSERT(hrtimer_tree_s, &g_hrtimer_tree,
   > > >                             &hrtimer->node);
   > > >                 }
   > > >               else
   > > >                 {
   > > >                   hrtimer->state = HRTIMER_STATE_INACTIVE;
   > > >                 }
   > > > 
   > > >               break;
   > > >             }
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > **and your new test passed on rv-virt:smp64 when this update is 
added:**
   > > > ```
   > > > NuttShell (NSH)
   > > > nsh> 
   > > > nsh> 
   > > > nsh> uname -a
   > > > NuttX 0.0.0 02057dc356-dirty Dec 20 2025 00:21:54 risc-v rv-virt
   > > > nsh> 
   > > > nsh> hello
   > > > 
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > ???
   > > > 
   > > > (...)
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > **By the way, I have implemented hrtimer as an independent module, so 
even if it is not perfect, it does not impact existing NuttX functionality. I 
also want to emphasize that I respect you very much and have no intention of 
offending you. My concern is simply that spending a significant amount of time 
re-inventing the wheel—and claiming that a new implementation is superior in 
every aspect—may not be the most productive approach. In my opinion, building 
improvements on top of an existing implementation could be a more efficient and 
constructive direction.**
   > > 
   > > 
   > > I sincerely apologize for providing incorrect test cases earlier. Due to 
time constraints, I did not have the opportunity to review them promptly, which 
unfortunately wasted your time. To address this, I have now provided a 
reproducible test case that demonstrates the kind of thread interleaving I 
mentioned.
   > > ```c
   > > #include <nuttx/config.h>
   > > #include <stdio.h>
   > > 
   > > #include <nuttx/hrtimer.h>
   > > #include <sys/resource.h>
   > > 
   > > 
/****************************************************************************
   > >  * Name: hrtimer_process
   > >  *
   > >  * Description:
   > >  *   Called from the timer interrupt handler to process expired
   > >  *   high-resolution timers. If a timer has expired, its callback
   > >  *   function will be executed in the context of the timer interrupt.
   > >  *
   > >  * Input Parameters:
   > >  *   now - The current time (nsecs).
   > >  *
   > >  * Returned Value:
   > >  *   None
   > >  
****************************************************************************/
   > > 
   > > void hrtimer_process(uint64_t now);
   > > 
   > > 
/****************************************************************************
   > >  * Inline Functions
   > >  
****************************************************************************/
   > > 
   > > 
/****************************************************************************
   > >  * Name: hrtimer_gettime
   > >  *
   > >  * Description:
   > >  *   Get the current high-resolution time in nanoseconds.
   > >  *
   > >  * Returned Value:
   > >  *   Current time in nanoseconds.
   > >  
****************************************************************************/
   > > 
   > > static inline_function
   > > uint64_t hrtimer_gettime(void)
   > > {
   > >   struct timespec ts;
   > > 
   > >   /* Get current time from platform-specific timer */
   > > 
   > >   clock_systime_timespec(&ts);
   > > 
   > >   /* Convert timespec to nanoseconds */
   > > 
   > >   return clock_time2nsec(&ts);
   > > }
   > > 
   > > #define HRTIMER_TEST_THREAD_NR (4)
   > > 
   > > 
/****************************************************************************
   > >  * Public Functions
   > >  
****************************************************************************/
   > > 
   > > static volatile int count = 0;
   > > static hrtimer_t    timer;
   > > static spinlock_t   lock = SP_UNLOCKED;
   > > 
   > > static uint32_t test_callback(FAR struct hrtimer_s *hrtimer)
   > > {
   > >   irqstate_t flags;
   > >   up_ndelay(10 * NSEC_PER_USEC);
   > >   flags = spin_lock_irqsave(&lock);
   > >   count++;
   > >   spin_unlock_irqrestore(&lock, flags);
   > >   return 0;
   > > }
   > > 
   > > static uint32_t test_callback_back(FAR struct hrtimer_s *hrtimer)
   > > {
   > >   return 0;
   > > }
   > > 
   > > static void* test_thread(void *arg)
   > > {
   > >   irqstate_t  flags;
   > >   int         tid   = (int)arg;
   > >   int         ret;
   > > 
   > >   if (tid == 0)
   > >     {
   > >       hrtimer_init(&timer, test_callback, NULL);
   > >     }
   > > 
   > >   while (1)
   > >     {
   > >       if (tid == 0)
   > >         {
   > >           int tmp_count;
   > >           flags = spin_lock_irqsave(&lock);
   > >           ret = hrtimer_cancel(&timer);
   > >           ret = hrtimer_start(&timer, 0, HRTIMER_MODE_REL);
   > >           spin_unlock_irqrestore(&lock, flags);
   > > 
   > >           up_ndelay(1000);
   > > 
   > >           spin_lock_irqsave(&lock);
   > >           ret = hrtimer_cancel(&timer);
   > >           ret = hrtimer_start(&timer, 0, HRTIMER_MODE_REL);
   > >           spin_unlock_irqrestore(&lock, flags);
   > > 
   > >           ret = hrtimer_cancel_sync(&timer); // timer should be 
fully-cancelled here.
   > >           tmp_count = count; // Read the count
   > >           printf("%d\n", tmp_count);
   > >           ASSERT(tmp_count == count); // This should not assert since 
timer has been cancelled.
   > >         }
   > >       else
   > >         {
   > >           flags = up_irq_save();
   > >           hrtimer_process(hrtimer_gettime());
   > >           up_irq_restore(flags);
   > >         }
   > >     }
   > > 
   > >   UNUSED(ret);
   > > 
   > >   return NULL;
   > > }
   > > 
   > > 
/****************************************************************************
   > >  * hello_main
   > >  
****************************************************************************/
   > > 
   > > int main(int argc, FAR char *argv[])
   > > {
   > >   unsigned int   thread_id;
   > >   pthread_attr_t attr;
   > >   pthread_t      pthreads[HRTIMER_TEST_THREAD_NR];
   > > 
   > >   printf("hrtimer_test start...\n");
   > > 
   > >   ASSERT(pthread_attr_init(&attr) == 0);
   > > 
   > >   /* Create wdog test thread */
   > > 
   > >   for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
   > >     {
   > >       ASSERT(pthread_create(&pthreads[thread_id], &attr,
   > >                             test_thread, (void *)thread_id) == 0);
   > >     }
   > > 
   > >   for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
   > >     {
   > >       pthread_join(pthreads[thread_id], NULL);
   > >     }
   > > 
   > >   ASSERT(pthread_attr_destroy(&attr) == 0);
   > > 
   > >   printf("hrtimer_test end...\n");
   > >   return 0;
   > > }
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > In this scenario, a timer that should have been canceled continues to 
execute. **This could lead to very subtle concurrency bugs (such as 
use-after-free behavior). If such issues make their way into the production 
environment, they could result in hundreds or even thousands of man-hours being 
spent on debugging, causing significant losses**. Therefore, I find your claim 
that this is not a problem to be unconvincing.
   > > I support @anchao’s viewpoint that we should design HRTimer from the 
user's perspective. In #17556, such issues are guaranteed not to occur. As I 
have mentioned before, **there are fundamental flaws in the underlying state 
machine design of your hrtimer. Minor modifications cannot fix these issues**. 
Therefore, I believe the **most efficient approach is to replace your 
implementation with #17556**. Honestly, your attempts, in my view, are good but 
futile. Because **the root of the problem is the violation of ownership 
invariants**. If you insist on finding a correct solution through trial and 
error, I can only respect your decision and wish you the best.
   > > Regarding the accusation of “reinventing the wheel,” I must counter your 
argument—
   > > 
   > > * Strive for excellence. Today, there are numerous embedded RTOS options 
available. If NuttX aims to attract more users, it must excel in every aspect 
and prove itself superior to other RTOSes. Even a 1% optimization in CPU time 
overhead or a few bytes of memory savings can benefit tens of millions of IoT 
devices running NuttX, saving substantial energy and hardware costs. In my 
opinion, every byte and every CPU cycle should be optimized as much as 
possible. I hope you share this mindset.
   > > * As educated engineers, we should strive to deliver our best work. 
However, pursuing excellence demands significant time and effort—if you are 
familiar with the Pareto principle, you’ll understand that achieving 80% often 
requires only 1% of the effort, but perfecting the remaining 20% demands 99% of 
the effort. Please do not assume that those who pursue this level of refinement 
are wasting their time. This is extremely disrespectful.
   > > 
   > > Furthermore, as @anchao emphasized, the community is not a testing 
ground. We should at least ensure basic functional correctness before merging 
code into the main branch. Please refrain from hastily integrating flawed code 
like this in the future.
   > > Finally, over the coming weeks, I will focus on merging #17556. I do not 
have the time to repeatedly point out the fundamental issues in your underlying 
state machine design. I hope you understand. If you have the time, I would 
appreciate it if you could review my design for any functional correctness 
issues or provide suggestions for improvement. Thank you.
   > > ```shell
   > > qemu-system-riscv32 -semihosting -M virt,aclint=on -cpu rv32 -smp 8 
-bios none -nographic -kernel nuttx -s
   > > ABC
   > > NuttShell (NSH) NuttX-12.11.0
   > > nsh>
   > > nsh> hello
   > > hrtimer_test start...
   > > 1
   > > 2
   > > 3
   > > 4
   > > 5
   > > 5
   > > [CPU3] Current Version: NuttX  12.11.0 290d47b4da-dirty Dec 20 2025 
12:23:59 risc-v
   > > [CPU3] Assertion failed : at file: :0 task(CPU3): hello process: hello 
0x80017690
   > > [CPU3] EPC: 800021cc
   > > [CPU3] A0: 8002b774 A1: 00000000 A2: 00000003 A3: 80033090
   > > [CPU3] A4: 00000002 A5: 00000001 A6: 00000000 A7: 00000001
   > > [CPU3] T0: 0000002e T1: 0000006a T2: 000001ff T3: 0000004c
   > > [CPU3] T4: 00000068 T5: 00000009 T6: 0000002a
   > > [CPU3] S0: 0000018c S1: 00000000 S2: 8002b2d0 S3: 80033090
   > > [CPU3] S4: 00000000 S5: 00000000 S6: 8002d000 S7: 00002088
   > > [CPU3] S8: 00000006 S9: 00000000 S10: 00000000 S11: 00000000
   > > [CPU3] SP: 80033800 FP: 0000018c TP: 00000000 RA: 800021cc
   > > [CPU3] User Stack:
   > > [CPU3]   base: 0x80033178
   > > [CPU3]   size: 00002024
   > > [CPU3]     sp: 0x80033800
   > > [CPU3] 0x800337e0: 8002d000 800227d4 800227d4 80033090 8002b2d0 8002b774 
800324e8 8000228c
   > > [CPU3] 0x80033800: 80017690 00000000 00000000 00000002 0000000a 800338ac 
00000000 80008346
   > > [CPU3] 0x80033820: 00000000 8001e424 80033090 8002b774 00000000 00000000 
00000000 7474754e
   > > [CPU3] 0x80033840: 00000058 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
   > > [CPU3] 0x80033860: 00000000 00000000 00000000 00000000 2e323100 302e3131 
80025000 8002d000
   > > [CPU3] 0x80033880: 8002d000 393257d4 37346430 61643462 7269642d 44207974 
32206365 30322030
   > > [CPU3] 0x800338a0: 31203532 33323a32 0039353a 00000002 8001cec0 8001ce6a 
736972de 00762d63
   > > [CPU3] 0x800338c0: 00002088 00000005 800257d4 8001c372 00000000 00000000 
00000000 00000000
   > > [CPU3] 0x800338e0: 00000000 8002d404 80025000 8002d000 8002d000 00000005 
8002d000 80007bcc
   > > [CPU3] 0x80033900: 00000001 00000000 ee6b2800 80017798 00000000 00000000 
00000000 00000000
   > > [CPU3] 0x80033920: 00000000 00000000 00000000 00000000 00000000 00000000 
80033090 8001bf0c
   > > [CPU3] 0x80033940: 00000000 00000000 00000000 8001d36a 00000000 00000000 
00000000 00000000
   > > [CPU3] 0x80033960: 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
   > > nsh>
   > > ```
   > 
   > **You miss used hrtimer, the hrtime_process is not an external api to be 
used in user's task, it should be called from isr, i updated your test code a 
litttl bit, and the test succeeded:**
   > 
   > ```
   > #include <nuttx/config.h>
   > #include <stdio.h>
   > 
   > #include <nuttx/hrtimer.h>
   > 
   > #define HRTIMER_TEST_THREAD_NR (4)
   > 
   > 
/****************************************************************************
   >  * Public Functions
   >  
****************************************************************************/
   > 
   > static volatile int count = 0;
   > static hrtimer_t    timer;
   > static spinlock_t   lock = SP_UNLOCKED;
   > 
   > static uint32_t test_callback(FAR struct hrtimer_s *hrtimer)
   > {
   >   irqstate_t flags;
   >   up_ndelay(10 * NSEC_PER_USEC);
   >   flags = spin_lock_irqsave(&lock);
   >   count++;
   >   spin_unlock_irqrestore(&lock, flags);
   >   return 0;
   > }
   > 
   > static uint32_t test_callback_back(FAR struct hrtimer_s *hrtimer)
   > {
   >   return 0;
   > }
   > 
   > static void* test_thread(void *arg)
   > {
   >   irqstate_t  flags;
   >   int         tid   = (int)arg;
   >   int         ret;
   > 
   >   if (tid == 0)
   >     {
   >       hrtimer_init(&timer, test_callback, NULL);
   >     }
   > 
   >   while (1)
   >     {
   >       if (tid == 0)
   >         {
   >           int tmp_count;
   >           flags = spin_lock_irqsave(&lock);
   >           ret = hrtimer_cancel(&timer);
   >           ret = hrtimer_start(&timer, 0, HRTIMER_MODE_REL);
   >           spin_unlock_irqrestore(&lock, flags);
   > 
   >           up_ndelay(1000);
   > 
   >           spin_lock_irqsave(&lock);
   >           ret = hrtimer_cancel(&timer);
   >           ret = hrtimer_start(&timer, 0, HRTIMER_MODE_REL);
   >           spin_unlock_irqrestore(&lock, flags);
   > 
   >           ret = hrtimer_cancel_sync(&timer); // timer should be 
fully-cancelled here.
   >           tmp_count = count; // Read the count
   >           printf("%d\n", tmp_count);
   >           ASSERT(tmp_count == count); // This should not assert since 
timer has been cancelled.
   >         }
   >       else
   >         {
   >           flags = up_irq_save();
   >           up_irq_restore(flags);
   >         }
   >     }
   > 
   >   UNUSED(ret);
   > 
   >   return NULL;
   > }
   > 
   > 
/****************************************************************************
   >  * hello_main
   >  
****************************************************************************/
   > 
   > int main(int argc, FAR char *argv[])
   > {
   >   unsigned int   thread_id;
   >   pthread_attr_t attr;
   >   pthread_t      pthreads[HRTIMER_TEST_THREAD_NR];
   > 
   >   printf("hrtimer_test start...\n");
   > 
   >   ASSERT(pthread_attr_init(&attr) == 0);
   > 
   >   /* Create wdog test thread */
   > 
   >   for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
   >     {
   >       ASSERT(pthread_create(&pthreads[thread_id], &attr,
   >                             test_thread, (void *)thread_id) == 0);
   >     }
   > 
   >   for (thread_id = 0; thread_id < HRTIMER_TEST_THREAD_NR; thread_id++)
   >     {
   >       pthread_join(pthreads[thread_id], NULL);
   >     }
   > 
   >   ASSERT(pthread_attr_destroy(&attr) == 0);
   > 
   >   printf("hrtimer_test end...\n");
   >   return 0;
   > }
   > ```
   > 
   > **test log on rv-virt:smp64:**
   > 
   > ```
   > NuttShell (NSH)
   > nsh> 
   > nsh> uname -a
   > NuttX 0.0.0 9bcc6ae543-dirty Dec 20 2025 15:05:57 risc-v rv-virt
   > nsh> hello
   > (...)
   > 11322
   > 11324
   > 11326
   > 11328
   > 11330
   > 11332
   > 11334
   > 11336
   > 11338
   > 11340
   > 11342
   > 11344
   > 11346
   > 11348
   > 11350
   > 11352
   > 11354
   > 11356
   > 11358
   > 11360
   > (...)
   > ```
   
   I am afraid that this does not appear to be an issue caused by 
`hrtimer_process(hrtimer_gettime());`. The call is only intended to make it 
easier and faster to reproduce the thread interleaving scenario I described 
earlier.  
   When you remove the `ASSERT` on line 108, the program executes normally, 
which indicates that calling `hrtimer_process` here is OK.
   This test case is designed to demonstrate that the following interleaving 
scenario can occur during actual concurrency expiration handling:  
   
   1. **Core 1** is executing the old timer callback function.
   2. **Core 0** restarts a new timer.
   3. **Core 2** execute timer expiration, and it begins executing the new 
timer callback function.
   4. **Core 1** finishes its old callback execution and sets `hrtimer->state` 
to **INACTIVE**.  
   5. **Core 0** attempts to cancel the timer, finds `hrtimer->state == 
INACTIVE`, and assumes the timer has completed. However, **Core 2** is still 
executing the new timer callback, which may lead to a **use-after-free** issue.
   
   The root-cause is the **violation of the ownership invariants**, which 
allows two different owners modify the same timer. In my opinion, this problem 
cannot be solved unless the entire state machine is redesigned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to