xiaoxiang781216 commented on code in PR #15929: URL: https://github.com/apache/nuttx/pull/15929#discussion_r1984406661
########## arch/risc-v/src/common/riscv_mtimer.c: ########## @@ -222,8 +221,11 @@ static int riscv_mtimer_start(struct oneshot_lowerhalf_s *lower, flags = up_irq_save(); mtime = riscv_mtimer_get_mtime(priv); - alarm = mtime + ts->tv_sec * priv->freq + - ts->tv_nsec * priv->freq / NSEC_PER_SEC; + + /* Align each tick start to cycle_per_tick boundary to avoid clock drift */ + + alarm = mtime / priv->cycle_per_tick * priv->cycle_per_tick + Review Comment: > This is wrong, and would add drift to the timer. The "mtime" here is not the value where the compare was set previously. It is some arbitrary time after that. The time between is unknown. The drift will eventually cause random errors in wdog functionality (e.g. sleeping for 1 tick may randomly timeout in less than 1 tick time), which then again breaks all the HW drivers using the tick based sleep as a timeout timer. > here is a possible fix : ``` alarm = mtime + (tick * priv->freq + TICK_PER_SEC - 1) / TICK_PER_SEC; ``` > This is the main reason why it is smart idea to only allow tick times which are even multiples of the underlying hw timer; you know the previous tick start by always aligning the current mtime to the tick time boundary. > even multiples is good, but it mayn't possible with the real chip design. > There is a way around this by using priv->freq as well, by specifically storing the previously set compare time, and always calculate the needed amount of ticks to progress by using the current mtime and previously set compare value.... > > I didn't do all of this, as I still don't understand why you can't just allow the inaccuracy in a single tick cycle time. > It impossible to achieve the accuracy within 1 tick for any sleep/timeout/wdog, that's why wdog always increase 1 tick: https://github.com/apache/nuttx/blob/master/sched/wdog/wd_start.c#L295 so, I never think one single tick error is a problem. The suggestion I made here is only to improve the peformance. > What is the practical problem you are trying to solve?? What's I really care is the drift and accuracy from clock_gettime(riscv_mtimer_current). > I can, however, make a suggestion tomorrow following that path if you insist. > > I see that there is again changes to arm64 tick timer. I hope this time drift was not added _again_ there. It is painful to debug because it causes very random failures in HW drivers due to timeout timers firing too early randomly. > the early wakeup could be fixed by tick+1 and roundup hardware count. > But again, adding this drift to the timer is a horrible mistake, which is repeated over and over again with these timers. Yes, I agree that timing error is hard to fix. so, let's come up a common solution to fix both arm and riscv platform. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org