Old:

while (tickNextIdle < curTick)
    tickNextIdle += clock;

New:
if (tickNextIdle < curTick)
       tickNextIdle = (curTick / clock) * clock + clock;

We actually went back and forth about this. The question is, when is the divide better than the loop? We never profiled the code, so we didn't have evidence that one was better than the other. It seems that you're saying that the divide is better. We were even thinking that it might make sense to try the add say four times, and do the multiply if it didn't work out. (That way busy busses only pay the cost of an add or two, and slow busses escape with a divide/multiply.) Would you be willing to try somethign like this?

if (tickNextIdle < curTick)
    tickNextIdle += clock

if (tickNextIdle < curTick)
    tickNextIdle += clock

if (tickNextIdle < curTick)
    tickNextIdle += clock

if (tickNextIdle < curTick)
    tickNextIdle += clock

if (tickNextIdle < curTick)
    tickNextIdle = (curTick / clock) * clock + clock;



It really has been quite a while since anyone in the core group has profiled M5, so we'd appreciate this sort of help from people! If you can identify something that's slow and what fraction of the total execution it accounts for, that'd be fantastic. Suggestions on improvement are an added bonus. A statement about what kind of speedup is achieved is icing on the cake.

Thanks,

  Nate
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to