Thanks for your interest in improving gem5!

The idea of doing binary translation to improve performance (particularly
for functional fast-forwarding) has come up before, but we haven't crossed
that bridge for several reasons:
1. Most of all, it's really really hard, and there are always plenty of
other more pressing things to work on.
2. Our current ISA descriptions weren't set up with this in mind, so it
would probably require reworking the ISA descriptions in addition to
building the framework.
3. Other groups (like QEMU and AMD's SimNow) have already built binary
translation tools that are way better than anything we would do.
4. For x86, at least, the idea of using hardware virtualization provides an
alternative that could have even higher performance than binary translation.

What we've generally been thinking of as a more desirable and achievable
alternative would be to interoperate with another environment like QEMU,
SimNow, or KVM so that you could run at high speed in one of these other
tools, extract the system state, load it into gem5, and then run a detailed
simulation from there.  Gabe Black did a little exploration of KVM quite a
while ago, but I don't think he got that far (correct me if I'm wrong,
Gabe).  I also did a little internal playing around with SimNow but nothing
I can release.  Other than that I don't know of anyone who's worked on this
yet.

Since the issues are pretty much the same, I'll use the term EE to refer to
a high-speed emulated environment, whether it's QEMU, SimNow, KVM, or
something else.

In theory, it's pretty straightforward; architectural CPU and memory state
is pretty well defined, and most of these systems have checkpoint/snapshot
capability, so it's simply a matter of running in one of these EEs, saving
a checkpoint, and loading it up into gem5.  The big challenge really
revolves around devices: the set of devices that gem5 supports doesn't
necessarily intersect with those that these EEs support, and the internal
state representation is guaranteed to be different.

I think the best solution to the device problem is to find a way to use the
*same* device models in both the EE and in gem5, either by grafting the
EE's device models into gem5 or the other way around.  For KVM, you'd have
to use gem5's models, since KVM by itself has no device models.  For other
EEs, there are potential benefits to finding a way to port their device
models into gem5, since I expect they have more models (and more complete
models) than we do (certainly for SimNow I know that's true).

However, a big potential downside of incorporating other device models
is licensing.  I know QEMU is GPL, which is problematic for us (since we
use a BSD-based license, and that's very important to us given the number
of companies involved with gem5).  Anything that would contaminate gem5
with GPL is unacceptable.  I haven't looked into QEMU enough to know if
this is something that can be worked around or not.

Also, while SimNow has a lot of appeal for those of us at AMD, I can see
where people would prefer an open-source and multi-ISA solution.  SimNow is
probably more feasible than you might think, though, since there is a free
binary version available (
http://developer.amd.com/tools/simnow/pages/default.aspx), and we have
contacts in the SimNow group to explore opening up additional internal
interfaces etc. if that proves necessary.

I think KVM might be the most appealing avenue; it does tie us even more to
Linux than we are already, but that's the only major downside I see.  It
also doesn't support all our ISAs, but Wikipedia says it does support
PowerPC in addition to x86, and there is an ARM port in the works (
http://systems.cs.columbia.edu/projects/kvm-arm/).  I expect that x86+ARM
covers the vast and growing majority of our user base.

Just to be complete, I'll mention that I'm sure there are opportunities to
improve the performance of the existing gem5 ISA simulation/emulation that
are simpler and more feasible than doing binary translation in gem5, but I
expect those opportunities are more like tens of percent speedup rather
than the order(s?) of magnitude or so you'd probably get out of going to
something like KVM.

I'd really be glad to see something along these lines happen, and am happy
to help to the extent I can.  I'm also interested if some of the other
developers have a different opinion or further insights.

Steve

On Sun, Mar 25, 2012 at 7:39 PM, Pablo Ortiz <[email protected]> wrote:

> Hello dev group,
>
> My group is looking at the possibility of improving the performance of
> GEM5 for the purpose of simulating an Android environment. In QEMU, there
> is a step performed during binary translation in which basic code blocks
> are translated and cached to be executed to avoid the overhead of having to
> translate common, previously translated code blocks. Would such an
> optimization be reasonably or doable or even sensible in the context of
> GEM5. I would love to hear the thoughts of the mailing list. I would like
> to thank, in advance, any who wish to respond to this email.
>
> Cheers,
> El
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to