Re: [linux-sunxi] Re: [PATCH v3 1/2] arm64: arch_timer: Workaround for Allwinner A64 timer instability

Roman Beránek Thu, 12 Dec 2019 08:34:31 -0800


On Wednesday, December 4, 2019 at 5:19:23 AM UTC+1, Vasily Khoruzhick wrote:
>
> On Mon, Jan 14, 2019 at 1:25 AM Marc Zyngier <[email protected] 
> <javascript:>> wrote: 
> > 
> > Hi Samuel, 
>
> Hi Samuel, 
>
> > On 13/01/2019 02:17, Samuel Holland wrote: 
> > > The Allwinner A64 SoC is known[1] to have an unstable architectural 
> > > timer, which manifests itself most obviously in the time jumping 
> forward 
> > > a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a 
> > > timer frequency of 24 MHz, implying that the time went slightly 
> backward 
> > > (and this was interpreted by the kernel as it jumping forward and 
> > > wrapping around past the epoch). 
> > > 
> > > Investigation revealed instability in the low bits of CNTVCT at the 
> > > point a high bit rolls over. This leads to power-of-two cycle forward 
> > > and backward jumps. (Testing shows that forward jumps are about twice 
> as 
> > > likely as backward jumps.) Since the counter value returns to normal 
> > > after an indeterminate read, each "jump" really consists of both a 
> > > forward and backward jump from the software perspective. 
> > > 
> > > Unless the kernel is trapping CNTVCT reads, a userspace program is 
> able 
> > > to read the register in a loop faster than it changes. A test program 
> > > running on all 4 CPU cores that reported jumps larger than 100 ms was 
> > > run for 13.6 hours and reported the following: 
> > > 
> > >  Count | Event 
> > > -------+--------------------------- 
> > >   9940 | jumped backward      699ms 
> > >    268 | jumped backward     1398ms 
> > >      1 | jumped backward     2097ms 
> > >  16020 | jumped forward       175ms 
> > >   6443 | jumped forward       699ms 
> > >   2976 | jumped forward      1398ms 
> > >      9 | jumped forward    356516ms 
> > >      9 | jumped forward    357215ms 
> > >      4 | jumped forward    714430ms 
> > >      1 | jumped forward   3578440ms 
> > > 
> > > This works out to a jump larger than 100 ms about every 5.5 seconds on 
> > > each CPU core. 
> > > 
> > > The largest jump (almost an hour!) was the following sequence of 
> reads: 
> > >     0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000 
> > > 
> > > Note that the middle bits don't necessarily all read as all zeroes or 
> > > all ones during the anomalous behavior; however the low 10 bits 
> checked 
> > > by the function in this patch have never been observed with any other 
> > > value. 
> > > 
> > > Also note that smaller jumps are much more common, with backward jumps 
> > > of 2048 (2^11) cycles observed over 400 times per second on each core. 
> > > (Of course, this is partially explained by lower bits rolling over 
> more 
> > > frequently.) Any one of these could have caused the 95 year time skip. 
> > > 
> > > Similar anomalies were observed while reading CNTPCT (after patching 
> the 
> > > kernel to allow reads from userspace). However, the CNTPCT jumps are 
> > > much less frequent, and only small jumps were observed. The same 
> program 
> > > as before (except now reading CNTPCT) observed after 72 hours: 
> > > 
> > >  Count | Event 
> > > -------+--------------------------- 
> > >     17 | jumped backward      699ms 
> > >     52 | jumped forward       175ms 
> > >   2831 | jumped forward       699ms 
> > >      5 | jumped forward      1398ms 
> > > 
> > > Further investigation showed that the instability in CNTPCT/CNTVCT 
> also 
> > > affected the respective timer's TVAL register. The following values 
> were 
> > > observed immediately after writing CNVT_TVAL to 0x10000000: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL 
> Error 
> > > 
> --------------------+------------+--------------------+----------------- 
> > >  0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000 
> > >  0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000 
> > >  0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000 
> > >  0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000 
> > > 
> > > The pattern of errors in CNTV_TVAL seemed to depend on exactly which 
> > > value was written to it. For example, after writing 0x10101010: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL 
> Error 
> > > 
> --------------------+------------+--------------------+----------------- 
> > >  0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000 
> > >  0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000 
> > >  0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000 
> > >  0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000 
> > >  0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000 
> > >  0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000 
> > > 
> > > I was also twice able to reproduce the issue covered by Allwinner's 
> > > workaround[4], that writing to TVAL sometimes fails, and both CVAL and 
> > > TVAL are left with entirely bogus values. One was the following 
> values: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL 
> > > 
> --------------------+------------+-------------------------------------- 
> > >  0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the 
> past) 
> > > 
> > > 
> ======================================================================== 
> > > 
> > > Because the CPU can read the CNTPCT/CNTVCT registers faster than they 
> > > change, performing two reads of the register and comparing the high 
> bits 
> > > (like other workarounds) is not a workable solution. And because the 
> > > timer can jump both forward and backward, no pair of reads can 
> > > distinguish a good value from a bad one. The only way to guarantee a 
> > > good value from consecutive reads would be to read _three_ times, and 
> > > take the middle value only if the three values are 1) each unique and 
> > > 2) increasing. This takes at minimum 3 counter cycles (125 ns), or 
> more 
> > > if an anomaly is detected. 
> > > 
> > > However, since there is a distinct pattern to the bad values, we can 
> > > optimize the common case (1022/1024 of the time) to a single read by 
> > > simply ignoring values that match the error pattern. This still takes 
> no 
> > > more than 3 cycles in the worst case, and requires much less code. As 
> an 
> > > additional safety check, we still limit the loop iteration to the 
> number 
> > > of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods. 
> > > 
> > > For the TVAL registers, the simple solution is to not use them. 
> Instead, 
> > > read or write the CVAL and calculate the TVAL value in software. 
> > > 
> > > Although the manufacturer is aware of at least part of the erratum[4], 
> > > there is no official name for it. For now, use the kernel-internal 
> name 
> > > "UNKNOWN1". 
> > > 
> > > [1]: https://github.com/armbian/build/commit/a08cd6fe7ae9 
> > > [2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/ 
> > > [3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26 
> > > [4]: 
> https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272
>  
> > 
> > nit: In general, I'm not overly keen on URLs in commit messages, as they 
> > may vanish without notice and the commit message becomes less useful. In 
> > the future, please keep those in the cover letter (though in this 
> > particular case, the commit message explains the issue pretty well, so 
> > no harm done once GitHub dies a horrible death... ;-). 
> > 
> > The fix itself looks pretty solid, and will hopefully make the 
> > "AllLoosers" HW more usable. 
>
> Unfortunately this patch doesn't completely eliminate the jumps. There 
> have been reports from users who still saw 95y jump even with the 
> patch applied. 
>
> Personally I've seen it once or twice on my Pine64-LTS. 
>
> Looks like we need bigger hammer. Does anyone have any idea what it could 
> be? 
>
> Regards, 
> Vasily 
>
>
> > Reviewed-by: Marc Zyngier <[email protected] <javascript:>> 
> > 
> > Daniel, please consider this for v5.1. 
> > 
> > Thanks, 
> > 
> >         M. 
> > -- 
> > Jazz is not dead. It just smells funny... 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "linux-sunxi" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
>


On Wednesday, December 4, 2019 at 5:19:23 AM UTC+1, Vasily Khoruzhick wrote:
>
> On Mon, Jan 14, 2019 at 1:25 AM Marc Zyngier <[email protected] 
> <javascript:>> wrote: 
> > 
> > Hi Samuel, 
>
> Hi Samuel, 
>
> > On 13/01/2019 02:17, Samuel Holland wrote: 
> > > The Allwinner A64 SoC is known[1] to have an unstable architectural 
> > > timer, which manifests itself most obviously in the time jumping 
> forward 
> > > a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a 
> > > timer frequency of 24 MHz, implying that the time went slightly 
> backward 
> > > (and this was interpreted by the kernel as it jumping forward and 
> > > wrapping around past the epoch). 
> > > 
> > > Investigation revealed instability in the low bits of CNTVCT at the 
> > > point a high bit rolls over. This leads to power-of-two cycle forward 
> > > and backward jumps. (Testing shows that forward jumps are about twice 
> as 
> > > likely as backward jumps.) Since the counter value returns to normal 
> > > after an indeterminate read, each "jump" really consists of both a 
> > > forward and backward jump from the software perspective. 
> > > 
> > > Unless the kernel is trapping CNTVCT reads, a userspace program is 
> able 
> > > to read the register in a loop faster than it changes. A test program 
> > > running on all 4 CPU cores that reported jumps larger than 100 ms was 
> > > run for 13.6 hours and reported the following: 
> > > 
> > >  Count | Event 
> > > -------+--------------------------- 
> > >   9940 | jumped backward      699ms 
> > >    268 | jumped backward     1398ms 
> > >      1 | jumped backward     2097ms 
> > >  16020 | jumped forward       175ms 
> > >   6443 | jumped forward       699ms 
> > >   2976 | jumped forward      1398ms 
> > >      9 | jumped forward    356516ms 
> > >      9 | jumped forward    357215ms 
> > >      4 | jumped forward    714430ms 
> > >      1 | jumped forward   3578440ms 
> > > 
> > > This works out to a jump larger than 100 ms about every 5.5 seconds on 
> > > each CPU core. 
> > > 
> > > The largest jump (almost an hour!) was the following sequence of 
> reads: 
> > >     0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000 
> > > 
> > > Note that the middle bits don't necessarily all read as all zeroes or 
> > > all ones during the anomalous behavior; however the low 10 bits 
> checked 
> > > by the function in this patch have never been observed with any other 
> > > value. 
> > > 
> > > Also note that smaller jumps are much more common, with backward jumps 
> > > of 2048 (2^11) cycles observed over 400 times per second on each core. 
> > > (Of course, this is partially explained by lower bits rolling over 
> more 
> > > frequently.) Any one of these could have caused the 95 year time skip. 
> > > 
> > > Similar anomalies were observed while reading CNTPCT (after patching 
> the 
> > > kernel to allow reads from userspace). However, the CNTPCT jumps are 
> > > much less frequent, and only small jumps were observed. The same 
> program 
> > > as before (except now reading CNTPCT) observed after 72 hours: 
> > > 
> > >  Count | Event 
> > > -------+--------------------------- 
> > >     17 | jumped backward      699ms 
> > >     52 | jumped forward       175ms 
> > >   2831 | jumped forward       699ms 
> > >      5 | jumped forward      1398ms 
> > > 
> > > Further investigation showed that the instability in CNTPCT/CNTVCT 
> also 
> > > affected the respective timer's TVAL register. The following values 
> were 
> > > observed immediately after writing CNVT_TVAL to 0x10000000: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL 
> Error 
> > > 
> --------------------+------------+--------------------+----------------- 
> > >  0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000 
> > >  0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000 
> > >  0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000 
> > >  0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000 
> > > 
> > > The pattern of errors in CNTV_TVAL seemed to depend on exactly which 
> > > value was written to it. For example, after writing 0x10101010: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL 
> Error 
> > > 
> --------------------+------------+--------------------+----------------- 
> > >  0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000 
> > >  0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000 
> > >  0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000 
> > >  0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000 
> > >  0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000 
> > >  0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000 
> > > 
> > > I was also twice able to reproduce the issue covered by Allwinner's 
> > > workaround[4], that writing to TVAL sometimes fails, and both CVAL and 
> > > TVAL are left with entirely bogus values. One was the following 
> values: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL 
> > > 
> --------------------+------------+-------------------------------------- 
> > >  0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the 
> past) 
> > > 
> > > 
> ======================================================================== 
> > > 
> > > Because the CPU can read the CNTPCT/CNTVCT registers faster than they 
> > > change, performing two reads of the register and comparing the high 
> bits 
> > > (like other workarounds) is not a workable solution. And because the 
> > > timer can jump both forward and backward, no pair of reads can 
> > > distinguish a good value from a bad one. The only way to guarantee a 
> > > good value from consecutive reads would be to read _three_ times, and 
> > > take the middle value only if the three values are 1) each unique and 
> > > 2) increasing. This takes at minimum 3 counter cycles (125 ns), or 
> more 
> > > if an anomaly is detected. 
> > > 
> > > However, since there is a distinct pattern to the bad values, we can 
> > > optimize the common case (1022/1024 of the time) to a single read by 
> > > simply ignoring values that match the error pattern. This still takes 
> no 
> > > more than 3 cycles in the worst case, and requires much less code. As 
> an 
> > > additional safety check, we still limit the loop iteration to the 
> number 
> > > of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods. 
> > > 
> > > For the TVAL registers, the simple solution is to not use them. 
> Instead, 
> > > read or write the CVAL and calculate the TVAL value in software. 
> > > 
> > > Although the manufacturer is aware of at least part of the erratum[4], 
> > > there is no official name for it. For now, use the kernel-internal 
> name 
> > > "UNKNOWN1". 
> > > 
> > > [1]: https://github.com/armbian/build/commit/a08cd6fe7ae9 
> > > [2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/ 
> > > [3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26 
> > > [4]: 
> https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272
>  
> > 
> > nit: In general, I'm not overly keen on URLs in commit messages, as they 
> > may vanish without notice and the commit message becomes less useful. In 
> > the future, please keep those in the cover letter (though in this 
> > particular case, the commit message explains the issue pretty well, so 
> > no harm done once GitHub dies a horrible death... ;-). 
> > 
> > The fix itself looks pretty solid, and will hopefully make the 
> > "AllLoosers" HW more usable. 
>
> Unfortunately this patch doesn't completely eliminate the jumps. There 
> have been reports from users who still saw 95y jump even with the 
> patch applied. 
>
> Personally I've seen it once or twice on my Pine64-LTS. 
>
> Looks like we need bigger hammer. Does anyone have any idea what it could 
> be? 
>
> Regards, 
> Vasily 
>
>
> > Reviewed-by: Marc Zyngier <[email protected] <javascript:>> 
> > 
> > Daniel, please consider this for v5.1. 
> > 
> > Thanks, 
> > 
> >         M. 
> > -- 
> > Jazz is not dead. It just smells funny... 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "linux-sunxi" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
>

On Wednesday, December 4, 2019 at 5:19:23 AM UTC+1, Vasily Khoruzhick wrote:
>
> On Mon, Jan 14, 2019 at 1:25 AM Marc Zyngier <[email protected] 
> <javascript:>> wrote: 
> > 
> > Hi Samuel, 
>
> Hi Samuel, 
>
> > On 13/01/2019 02:17, Samuel Holland wrote: 
> > > The Allwinner A64 SoC is known[1] to have an unstable architectural 
> > > timer, which manifests itself most obviously in the time jumping 
> forward 
> > > a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a 
> > > timer frequency of 24 MHz, implying that the time went slightly 
> backward 
> > > (and this was interpreted by the kernel as it jumping forward and 
> > > wrapping around past the epoch). 
> > > 
> > > Investigation revealed instability in the low bits of CNTVCT at the 
> > > point a high bit rolls over. This leads to power-of-two cycle forward 
> > > and backward jumps. (Testing shows that forward jumps are about twice 
> as 
> > > likely as backward jumps.) Since the counter value returns to normal 
> > > after an indeterminate read, each "jump" really consists of both a 
> > > forward and backward jump from the software perspective. 
> > > 
> > > Unless the kernel is trapping CNTVCT reads, a userspace program is 
> able 
> > > to read the register in a loop faster than it changes. A test program 
> > > running on all 4 CPU cores that reported jumps larger than 100 ms was 
> > > run for 13.6 hours and reported the following: 
> > > 
> > >  Count | Event 
> > > -------+--------------------------- 
> > >   9940 | jumped backward      699ms 
> > >    268 | jumped backward     1398ms 
> > >      1 | jumped backward     2097ms 
> > >  16020 | jumped forward       175ms 
> > >   6443 | jumped forward       699ms 
> > >   2976 | jumped forward      1398ms 
> > >      9 | jumped forward    356516ms 
> > >      9 | jumped forward    357215ms 
> > >      4 | jumped forward    714430ms 
> > >      1 | jumped forward   3578440ms 
> > > 
> > > This works out to a jump larger than 100 ms about every 5.5 seconds on 
> > > each CPU core. 
> > > 
> > > The largest jump (almost an hour!) was the following sequence of 
> reads: 
> > >     0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000 
> > > 
> > > Note that the middle bits don't necessarily all read as all zeroes or 
> > > all ones during the anomalous behavior; however the low 10 bits 
> checked 
> > > by the function in this patch have never been observed with any other 
> > > value. 
> > > 
> > > Also note that smaller jumps are much more common, with backward jumps 
> > > of 2048 (2^11) cycles observed over 400 times per second on each core. 
> > > (Of course, this is partially explained by lower bits rolling over 
> more 
> > > frequently.) Any one of these could have caused the 95 year time skip. 
> > > 
> > > Similar anomalies were observed while reading CNTPCT (after patching 
> the 
> > > kernel to allow reads from userspace). However, the CNTPCT jumps are 
> > > much less frequent, and only small jumps were observed. The same 
> program 
> > > as before (except now reading CNTPCT) observed after 72 hours: 
> > > 
> > >  Count | Event 
> > > -------+--------------------------- 
> > >     17 | jumped backward      699ms 
> > >     52 | jumped forward       175ms 
> > >   2831 | jumped forward       699ms 
> > >      5 | jumped forward      1398ms 
> > > 
> > > Further investigation showed that the instability in CNTPCT/CNTVCT 
> also 
> > > affected the respective timer's TVAL register. The following values 
> were 
> > > observed immediately after writing CNVT_TVAL to 0x10000000: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL 
> Error 
> > > 
> --------------------+------------+--------------------+----------------- 
> > >  0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000 
> > >  0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000 
> > >  0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000 
> > >  0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000 
> > > 
> > > The pattern of errors in CNTV_TVAL seemed to depend on exactly which 
> > > value was written to it. For example, after writing 0x10101010: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL 
> Error 
> > > 
> --------------------+------------+--------------------+----------------- 
> > >  0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000 
> > >  0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000 
> > >  0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000 
> > >  0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000 
> > >  0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000 
> > >  0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000 
> > > 
> > > I was also twice able to reproduce the issue covered by Allwinner's 
> > > workaround[4], that writing to TVAL sometimes fails, and both CVAL and 
> > > TVAL are left with entirely bogus values. One was the following 
> values: 
> > > 
> > >  CNTVCT             | CNTV_TVAL  | CNTV_CVAL 
> > > 
> --------------------+------------+-------------------------------------- 
> > >  0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the 
> past) 
> > > 
> > > 
> ======================================================================== 
> > > 
> > > Because the CPU can read the CNTPCT/CNTVCT registers faster than they 
> > > change, performing two reads of the register and comparing the high 
> bits 
> > > (like other workarounds) is not a workable solution. And because the 
> > > timer can jump both forward and backward, no pair of reads can 
> > > distinguish a good value from a bad one. The only way to guarantee a 
> > > good value from consecutive reads would be to read _three_ times, and 
> > > take the middle value only if the three values are 1) each unique and 
> > > 2) increasing. This takes at minimum 3 counter cycles (125 ns), or 
> more 
> > > if an anomaly is detected. 
> > > 
> > > However, since there is a distinct pattern to the bad values, we can 
> > > optimize the common case (1022/1024 of the time) to a single read by 
> > > simply ignoring values that match the error pattern. This still takes 
> no 
> > > more than 3 cycles in the worst case, and requires much less code. As 
> an 
> > > additional safety check, we still limit the loop iteration to the 
> number 
> > > of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods. 
> > > 
> > > For the TVAL registers, the simple solution is to not use them. 
> Instead, 
> > > read or write the CVAL and calculate the TVAL value in software. 
> > > 
> > > Although the manufacturer is aware of at least part of the erratum[4], 
> > > there is no official name for it. For now, use the kernel-internal 
> name 
> > > "UNKNOWN1". 
> > > 
> > > [1]: https://github.com/armbian/build/commit/a08cd6fe7ae9 
> > > [2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/ 
> > > [3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26 
> > > [4]: 
> https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272
>  
> > 
> > nit: In general, I'm not overly keen on URLs in commit messages, as they 
> > may vanish without notice and the commit message becomes less useful. In 
> > the future, please keep those in the cover letter (though in this 
> > particular case, the commit message explains the issue pretty well, so 
> > no harm done once GitHub dies a horrible death... ;-). 
> > 
> > The fix itself looks pretty solid, and will hopefully make the 
> > "AllLoosers" HW more usable. 
>
> Unfortunately this patch doesn't completely eliminate the jumps. There 
> have been reports from users who still saw 95y jump even with the 
> patch applied. 
>
> Personally I've seen it once or twice on my Pine64-LTS. 
>

I can conform that. Our team in Prusa Research has built a printer on top 
of the A64
and as soon as the testing production reached a few dozens of units, bug 
reports
started coming in.
 

>
> Looks like we need bigger hammer. Does anyone have any idea what it could 
> be? 
>

We've decided to apply the QorIQ Erratum A-008585 (FSL_ERRATUM_A008585)
workaround instead and that solved the issue for us: over a thousand units 
have
been shipped to our customers and so far so good.
 

>
> Regards, 
> Vasily 
>
>
> > Reviewed-by: Marc Zyngier <[email protected] <javascript:>> 
> > 
> > Daniel, please consider this for v5.1. 
> > 
> > Thanks, 
> > 
> >         M. 
> > -- 
> > Jazz is not dead. It just smells funny... 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "linux-sunxi" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
>

Best regards 
Roman B.

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/linux-sunxi/4005fa8b-6f72-4f2a-b8cb-669c2eb8c067%40googlegroups.com.

Re: [linux-sunxi] Re: [PATCH v3 1/2] arm64: arch_timer: Workaround for Allwinner A64 timer instability

Reply via email to