OK. Right now I really don't have the time to do some real profiling (e.g. with gprof), but with this simple _integer_ divide and multiply I got very good performance, much better then the original b2 and it's almost as fast as v1.1. So I'm happy with it for now. I first need to get my code working and after that I'll invest a couple of days on profiling. BTW, integer divide and multiply is quite inexpensive. You do a lot more expensive stuff in other places, I'm sure. :)
I might give a try to your sample of code, but I'm not sure that four comparisons, branches and additions can perform much faster than one comparison and integer division/multiply + add. Could be a little faster :) P.S. I think that on restore from a checkpoint, tickNextIdle starts from zero. After that it increments in 1000s up to curTick - which can be a really big number. best regards, El Miércoles, 21 de Febrero de 2007 17:06, Nathan Binkert escribió: > Old: > >> while (tickNextIdle < curTick) > >> tickNextIdle += clock; > > New: > > if (tickNextIdle < curTick) > > tickNextIdle = (curTick / clock) * clock + clock; > > We actually went back and forth about this. The question is, when is the > divide better than the loop? We never profiled the code, so we didn't > have evidence that one was better than the other. It seems that you're > saying that the divide is better. We were even thinking that it might > make sense to try the add say four times, and do the multiply if it didn't > work out. (That way busy busses only pay the cost of an add or two, and > slow busses escape with a divide/multiply.) Would you be willing to try > somethign like this? > > if (tickNextIdle < curTick) > tickNextIdle += clock > > if (tickNextIdle < curTick) > tickNextIdle += clock > > if (tickNextIdle < curTick) > tickNextIdle += clock > > if (tickNextIdle < curTick) > tickNextIdle += clock > > if (tickNextIdle < curTick) > tickNextIdle = (curTick / clock) * clock + clock; > > > > It really has been quite a while since anyone in the core group has > profiled M5, so we'd appreciate this sort of help from people! If you can > identify something that's slow and what fraction of the total execution it > accounts for, that'd be fantastic. Suggestions on improvement are an > added bonus. A statement about what kind of speedup is achieved is icing > on the cake. > > Thanks, > > Nate -- Saša Tomić BSC - Barcelona SuperComputing Center c\ Jordi Girona 29, Nexus I, 08034 Barcelona, España Tel.: +34671218062, +34934054289 http://www.bsc.es _______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
