Still no file... Glad to see you looking into this. Quick thoughts:
- Those DynInst structures do seem huge. Slimming them down seems like it would be a big win. - I wouldn't take it on faith that FastAlloc is faster. It might be, but I wrote that a *long* time ago, and malloc has probably improved in the interim. It's easy to compile without it (I think there's a NO_FAST_ALLOC flag); it would be interesting to see if that matters. Of course, if most of the allocations aren't using it anyway, maybe that won't tell the whole story. - I wonder if all the ref-counting pointer stuff is because we're copying the pointers a lot instead of sharing references to them (e.g., as parameters to short function calls). Basically you end up incrementing then deleting the ref count if you pass a pointer by value instead of by reference, IIRC. Steve On Tue, May 29, 2012 at 8:01 AM, Ali Saidi <[email protected]> wrote: > Lets try to attach the file againÅ > > Ali > > > On 5/29/12 10:27 AM, "Ali Saidi" <[email protected]> wrote: > > > > > > >We recently took a look at the callgraph from gem5 with an O3 cpu > >and it's pretty startling (see attached picture). The majority of time > >is spent in memory management. The biggest chunk of this is in fetch > >when instructions are built, however I assumed that FastAlloc would be > >used. Nominally it would, except for that with both ARM and x86 the size > >of a DynInst is > 512 bytes which is the max size FastAlloc handles. > >Alpha seems to sneak under the limit, but either way it is astounding to > >me that a single instruction requires over .5kB of storage. Doing some > >quick math, if more than 64 dyninsts exist in the system they don't fit > >in the L1 cache anymore. One thing we can do is increase the max size of > >FastAlloc to 1kB, but it seems like we need to think about how to > >slim-down a DynInst. I've looked over it and it seems like we loose > >around 48 bytes to alignment issues, as members are scattered throughout > >and the are Addrs, bools and then more Addrs. It seems like changing > >some of the bools we currently have to setters/getters with an > >underlying bitvector might help, and we might want to think about > >packing the most used members together as apposed to the somewhat random > >approach we have right now. You could nearly half it, if the processor > >of interest doesn't have > 256 physical registers. > > > >Furthermore, > >looking at the picture there seem to be plenty of other places where > >there are a lot of calls to new (teal-ish)/free(orange). It seems like > >we could certainly make more use of FastAlloc, assuming it's actually > >helping. > > > >Thanks, > > > >Ali > > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
