On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft <
> *From:* Joel Sherrill [j...@rtems.org]
> *Sent:* Thursday, October 13, 2016 17:38
> *To:* Jakob Viketoft
> *Cc:* email@example.com
> *Subject:* Re: Time spent in ticks...
> >I don't have an or1k handy so ran on a sparc/erc32 simulator/
> >It is is a SPARC v7 at 15 Mhz.
> >These times are in microseconds and based on the tmtests.
> >Specifically tm08and tm27.
> >(1) rtems_clock_tick: only case - 52
> >(2) rtems interrupt: entry overhead returns to interrupted task - 12
> >(3) rtems interrupt: exit overhead returns to interrupted task - 4
> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11
> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3
The above was from the master with SMP enabled. I repeated it with
SMP disabled and it had no impact.
Since the timing change is post 4.11, I decided to try 4.11 with SMP
rtems_clock_tick: only case - 42
rtems interrupt: entry overhead returns to interrupted task - 11
rtems interrupt: exit overhead returns to interrupted task - 4
rtems interrupt: entry overhead returns to nested interrupt - 11
rtems interrupt: exit overhead returns to nested interrupt - 3
So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles
So the overhead has gone up some but as Pavel says it is quite likely
some mathematical operation on 64 bit types is slow on your CPU.
HINT: If you can write a benchmark for 64-bit operations,
it would be a good comparison between CPUs and might
highlight where the software implementation needs improvement.
> >The clock tick test has 100 tasks but it looks like they are blocked on a
> >without timeout.
> >Your times look WAY too high. Maybe the interrupt is stuck on and
> >not being cleared.
> >On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from
> >above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles.
> >So at a higher clock rate, it should be even less time.
> >My gut feeling is that I think something is wrong with the ISR handler
> >and it is stuck. But the performance is definitely way too high.
> (Sorry if the format got somewhat I garbled, anything but top-posting have
> to be done manually...)
> I re-tested my case using an -O3 optimization (we have been using -O0
> during development for debugging purposes) and I got a good performance
> boost, but I'm still nowhere near your numbers. I can vouch for that the
> interrupt (exception really) isn't stuck, but that the code unfortunately
> takes a long time to compute. I have a subsecond counter (1/16 of a second)
> which I'm sampling at various places in the code, storing its numbers to a
> buffer in memory so as to interfere with the program as little as possible.
> With -O3, a tick handling still takes ~320 us to perform, but the weight
> has now shifted. tc_windup takes ~214 us and the rest is obviously
> _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst
> speed bumps the biggest contribution (~122 us) seem to be coming from scale
> factor recalculation. Since it's 64 bits, it's turned into a software
> function which can be quite time-consuming apparently.
> Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much
> higher than your total tick with a slower system (we're running at 50 MHz).
> Is there anything we can do to improve these numbers? Is Clock_isr
> intended to be run uninterrupted as it is now? Can't see that much of the
> BSP patch code has anything to do with the speed of what I'm looking at
> right now...
> *Jakob Viketoft *Senior Engineer in RTL and embedded software
> ÅAC Microtec AB
> Dag Hammarskjölds väg 48
> SE-751 83 Uppsala, Sweden
> T: +46 702 80 95 97
devel mailing list