Thanks for your interest in improving gem5! The idea of doing binary translation to improve performance (particularly for functional fast-forwarding) has come up before, but we haven't crossed that bridge for several reasons: 1. Most of all, it's really really hard, and there are always plenty of other more pressing things to work on. 2. Our current ISA descriptions weren't set up with this in mind, so it would probably require reworking the ISA descriptions in addition to building the framework. 3. Other groups (like QEMU and AMD's SimNow) have already built binary translation tools that are way better than anything we would do. 4. For x86, at least, the idea of using hardware virtualization provides an alternative that could have even higher performance than binary translation.
What we've generally been thinking of as a more desirable and achievable alternative would be to interoperate with another environment like QEMU, SimNow, or KVM so that you could run at high speed in one of these other tools, extract the system state, load it into gem5, and then run a detailed simulation from there. Gabe Black did a little exploration of KVM quite a while ago, but I don't think he got that far (correct me if I'm wrong, Gabe). I also did a little internal playing around with SimNow but nothing I can release. Other than that I don't know of anyone who's worked on this yet. Since the issues are pretty much the same, I'll use the term EE to refer to a high-speed emulated environment, whether it's QEMU, SimNow, KVM, or something else. In theory, it's pretty straightforward; architectural CPU and memory state is pretty well defined, and most of these systems have checkpoint/snapshot capability, so it's simply a matter of running in one of these EEs, saving a checkpoint, and loading it up into gem5. The big challenge really revolves around devices: the set of devices that gem5 supports doesn't necessarily intersect with those that these EEs support, and the internal state representation is guaranteed to be different. I think the best solution to the device problem is to find a way to use the *same* device models in both the EE and in gem5, either by grafting the EE's device models into gem5 or the other way around. For KVM, you'd have to use gem5's models, since KVM by itself has no device models. For other EEs, there are potential benefits to finding a way to port their device models into gem5, since I expect they have more models (and more complete models) than we do (certainly for SimNow I know that's true). However, a big potential downside of incorporating other device models is licensing. I know QEMU is GPL, which is problematic for us (since we use a BSD-based license, and that's very important to us given the number of companies involved with gem5). Anything that would contaminate gem5 with GPL is unacceptable. I haven't looked into QEMU enough to know if this is something that can be worked around or not. Also, while SimNow has a lot of appeal for those of us at AMD, I can see where people would prefer an open-source and multi-ISA solution. SimNow is probably more feasible than you might think, though, since there is a free binary version available ( http://developer.amd.com/tools/simnow/pages/default.aspx), and we have contacts in the SimNow group to explore opening up additional internal interfaces etc. if that proves necessary. I think KVM might be the most appealing avenue; it does tie us even more to Linux than we are already, but that's the only major downside I see. It also doesn't support all our ISAs, but Wikipedia says it does support PowerPC in addition to x86, and there is an ARM port in the works ( http://systems.cs.columbia.edu/projects/kvm-arm/). I expect that x86+ARM covers the vast and growing majority of our user base. Just to be complete, I'll mention that I'm sure there are opportunities to improve the performance of the existing gem5 ISA simulation/emulation that are simpler and more feasible than doing binary translation in gem5, but I expect those opportunities are more like tens of percent speedup rather than the order(s?) of magnitude or so you'd probably get out of going to something like KVM. I'd really be glad to see something along these lines happen, and am happy to help to the extent I can. I'm also interested if some of the other developers have a different opinion or further insights. Steve On Sun, Mar 25, 2012 at 7:39 PM, Pablo Ortiz <[email protected]> wrote: > Hello dev group, > > My group is looking at the possibility of improving the performance of > GEM5 for the purpose of simulating an Android environment. In QEMU, there > is a step performed during binary translation in which basic code blocks > are translated and cached to be executed to avoid the overhead of having to > translate common, previously translated code blocks. Would such an > optimization be reasonably or doable or even sensible in the context of > GEM5. I would love to hear the thoughts of the mailing list. I would like > to thank, in advance, any who wish to respond to this email. > > Cheers, > El > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
