Thanks for the data.  Looks like a definite win.

I'm also surprised that the FastAlloc version is slower.  I can believe
that the glibc malloc has improved to the point where it wouldn't be much
of a win, but I don't know what it would be doing to actually be faster,
since my recollection of FastAlloc is that it does a pretty minimal amount
of work on allocation.  If these results hold in general, we may want to
get rid of FastAlloc altogether.

Steve

On Sun, Apr 8, 2012 at 1:25 AM, Gabe Black <[email protected]> wrote:

> Attached are a few supplemental tidbits to go along with this change.
> The first is a file called trietimes.txt which shows the before and
> after of this change performance wise using the atomic CPU to boot linux
> and to run the twolf regression. I chose the atomic CPU because it would
> emphasize the impact of improving address translation. For booting,
> simulator performance improved by a little more than 16.5%, and for
> twolf a little more than 22.7%. I'm guessing twolf was a little better
> because SE doesn't muck with devices and translation is an even bigger
> part of what it does.
>
> I also tried a couple variations, one where I used FastAlloc for the
> internal Node struct used in the trie, and one where I cached the last
> successful lookup in the trie. Whenever the structure of the trie
> changed, I threw away the cache. In both cases, performance was similar
> but slightly worse. I was surprised especially that FastAlloc didn't
> help, but maybe glibc's malloc does a really good job with small objects
> now? The addrtrie.hh from both of these are attached for reference.
>
> I don't know if this class will give a similar performance boost to
> other ISAs or if x86's translation was just particularly stinky before.
> I expect it probably will help a little bit since a trie is such a well
> suited data structure for this sort of thing, but it's hard to say.
>
> Gabe
>
> On 04/08/12 01:02, Gabe Black wrote:
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > http://reviews.gem5.org/r/1143/
> > -----------------------------------------------------------
> >
> > Review request for Default.
> >
> >
> > Description
> > -------
> >
> > Changeset 8945:f40e80105a03
> > ---------------------------
> > X86: Use the AddrTrie class to implement the TLB.
> >
> > This change also adjusts the TlbEntry class so that it stores the number
> of
> > address bits wide a page is rather than its size in bytes. In other
> words,
> > instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn
> into 4K,
> > but it's a little harder going the other way.
> >
> >
> > Diffs
> > -----
> >
> >   src/arch/x86/pagetable.hh a47fd7c2d44e
> >   src/arch/x86/pagetable.cc a47fd7c2d44e
> >   src/arch/x86/pagetable_walker.hh a47fd7c2d44e
> >   src/arch/x86/pagetable_walker.cc a47fd7c2d44e
> >   src/arch/x86/tlb.hh a47fd7c2d44e
> >   src/arch/x86/tlb.cc a47fd7c2d44e
> >   src/arch/x86/vtophys.cc a47fd7c2d44e
> >
> > Diff: http://reviews.gem5.org/r/1143/diff/
> >
> >
> > Testing
> > -------
> >
> >
> > Thanks,
> >
> > Gabe Black
> >
> > _______________________________________________
> > gem5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/gem5-dev
>
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to