To continue this thread, I've continued to fine-tune the configurations for
sim-outorder and M5 to the point where (I believe) there are identical
functional units, identical branch predictors, identical cache hierarchies and
similar processor configurations. Considering the difference between both
simulators though, I suppose it's impossible to get a perfect match.

Anyway, I've simulated 1G instructions of each SPEC CINT2000 benchmark with
sim-outorder and M5 and for some metrics I'm getting a fairly good match. IPC
errors are withing 10% and branch prediction accuracy within 7%. However,
certain metrics are behaving really strangely -- taking aside the L1 icache, as
you explained earlier.

For example, for bzip2 the unified L2 miss rate error is only 0.2%, but for the
same benchmark the L1 dcache miss rate error is almost 50%. gap has a dcache
miss rate error of only 3%, while its L2 error is 25%. And this story continues
for many of the other benchmarks

Do you have any idea why these results are so variable? Is there a fundamental
difference in the way M5 and sim-outorder implement caches, or could it be due
to the processor models? Or do you suspect I'm doing something wrong? :)

I'd be inclined to think that M5, due to its complexity, approaches reality
better than sim-outorder, so maybe the problem is simply that SimpleScalar is
too Simple...

Thanks for all your help so far.

Regards,

Jos

Quoting Steve Reinhardt <[EMAIL PROTECTED]>:

> I believe SimpleScalar accesses the icache for each instruction, while
> M5's FullCPU only accesses the icache once per cycle for all the
> instructions it is fetching that are in the same icache line.
> (Depending on how aggressive your fetch model is, M5 will access the
> icache multiple times in a cycle, but only if it's fetching a set of
> instructions that span icache blocks.)  I wouldn't be surprised if the
> number of icache accesses in SimpleScalar is larger by a factor of about
> the issue width of the machine.
>
> Steve
>
> Jos Delbar wrote:
> > Hey,
> >
> > I am trying to tune the configurations of M5 and sim-outorder to achieve
> > similar IPC results for uniprocessor simulations. Using more or less
> > identical processor configurations (as good as possible considering the
> > differences between the two simulators), identical cache configurations,
> > etc., I am getting "reasonable" results. One statistic is bothering me
> > though, and that is the amount of L1 icache lookups. For an identical
> > benchmark, sim-outorder reports +/- twice as many icache lookups as M5. L1
> > data and L2 unified cache lookups only differ a few percent.
> >
> > Does anyone have an idea where this huge difference is coming from?
> >
> > Thanks,
> >
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> m5sim-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/m5sim-users
>


--



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users

Reply via email to