Thanks for the data. Looks like a definite win. I'm also surprised that the FastAlloc version is slower. I can believe that the glibc malloc has improved to the point where it wouldn't be much of a win, but I don't know what it would be doing to actually be faster, since my recollection of FastAlloc is that it does a pretty minimal amount of work on allocation. If these results hold in general, we may want to get rid of FastAlloc altogether.
Steve On Sun, Apr 8, 2012 at 1:25 AM, Gabe Black <[email protected]> wrote: > Attached are a few supplemental tidbits to go along with this change. > The first is a file called trietimes.txt which shows the before and > after of this change performance wise using the atomic CPU to boot linux > and to run the twolf regression. I chose the atomic CPU because it would > emphasize the impact of improving address translation. For booting, > simulator performance improved by a little more than 16.5%, and for > twolf a little more than 22.7%. I'm guessing twolf was a little better > because SE doesn't muck with devices and translation is an even bigger > part of what it does. > > I also tried a couple variations, one where I used FastAlloc for the > internal Node struct used in the trie, and one where I cached the last > successful lookup in the trie. Whenever the structure of the trie > changed, I threw away the cache. In both cases, performance was similar > but slightly worse. I was surprised especially that FastAlloc didn't > help, but maybe glibc's malloc does a really good job with small objects > now? The addrtrie.hh from both of these are attached for reference. > > I don't know if this class will give a similar performance boost to > other ISAs or if x86's translation was just particularly stinky before. > I expect it probably will help a little bit since a trie is such a well > suited data structure for this sort of thing, but it's hard to say. > > Gabe > > On 04/08/12 01:02, Gabe Black wrote: > > ----------------------------------------------------------- > > This is an automatically generated e-mail. To reply, visit: > > http://reviews.gem5.org/r/1143/ > > ----------------------------------------------------------- > > > > Review request for Default. > > > > > > Description > > ------- > > > > Changeset 8945:f40e80105a03 > > --------------------------- > > X86: Use the AddrTrie class to implement the TLB. > > > > This change also adjusts the TlbEntry class so that it stores the number > of > > address bits wide a page is rather than its size in bytes. In other > words, > > instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn > into 4K, > > but it's a little harder going the other way. > > > > > > Diffs > > ----- > > > > src/arch/x86/pagetable.hh a47fd7c2d44e > > src/arch/x86/pagetable.cc a47fd7c2d44e > > src/arch/x86/pagetable_walker.hh a47fd7c2d44e > > src/arch/x86/pagetable_walker.cc a47fd7c2d44e > > src/arch/x86/tlb.hh a47fd7c2d44e > > src/arch/x86/tlb.cc a47fd7c2d44e > > src/arch/x86/vtophys.cc a47fd7c2d44e > > > > Diff: http://reviews.gem5.org/r/1143/diff/ > > > > > > Testing > > ------- > > > > > > Thanks, > > > > Gabe Black > > > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
