I was talking specifically about the performance of the simulator
itself, but it makes sense that benchmark performance in a CPU like O3
would suffer too.
Gabe
Quoting "Watanabe, Yasuko" <[email protected]>:
Applying the patch that splits the x86 condition code register
resulted in more significant performance degradation on my end. The
degradation is as large as 50% for simple micro-benchmarks due to
increased pressure on the physical registers. I have not tested SPEC
benchmarks yet.
The performance impact is large enough that avoiding unnecessary
register reads and writes would help improve the performance a lot.
Yasuko
-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Gabe Black
Sent: Friday, May 25, 2012 12:45 AM
To: [email protected]
Subject: Re: [gem5-dev] performance impact of splitting up the registers
On 05/24/12 07:56, Nilay Vaish wrote:
On Thu, 24 May 2012, Gabe Black wrote:
Some quick testing shows that splitting the x86 condition code
register up adds about an 8% penalty on twolf simple atomic, relative
to a version of gem5 with some performance improvements that aren't
checked in yet. That's ok since this is something that needs to
happen, but that shows the overhead of reading the extra registers.
Avoiding reading any unnecessary registers (including the zero
register as a
placeholder/substitute) will help recover that lost performance.
Gabe, what performance improvements do you have in mind? We had
discussed before the possibility of the registers read/written by a
microop being decided at the time of construction of the microop,
instead of at compile time. From the discussion it seemed that we
would need to assess the execution time impact of such any such change
as it would probably affect all the ISAs.
--
Nilay
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
Just finding a way to avoid the unnecessary register reads, ie the
ones for bits that aren't being set or read, which is what we talked
about before. If we can simply avoid the reads without changing the
way the CPUs work, then it shouldn't affect the other ISAs. We'll
want to keep the changes localized to the way the StaticInsts are
set up, because technically that's all that really needs to change.
A larger goal might be to reduce the number of calls to the thread
context generally. Maybe we could add a readIntRegs (note the s) and
similar functions which read/write multiple registers at once and
reduced the function call overhead.
It also wouldn't hurt to double check my measurement as far as how
much of an impact splitting up the registers made.
Gabe
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev