Hi folks. I think often when we think of how python is used by gem5, many people (myself included) tend to think of it as a monolithic entity, or in other words that there is just *the* python that gets used by everything. The real situation is a little more complicated than that.
On the one hand, we have "the" version of the python interpreter itself, which is used to directly run python scripts. This is again an approximation because we could, for instance, have one version of python running SCons itself, and another version running scripts gem5 provides, but for simplicity let's assume that's not the case, or at least that it doesn't matter. Then, we have the python libraries which get built into gem5 and provide it's embedded interpreter. This may not exist at all on the target system, or it could exist but be a different version than the one which you'd get by running the command line "python foo.py". **Marshal binary** The first place this has caused complication is where gem5's python code is packaged up by the build system so that it can be embedded into gem5 in either library form or executable form. Through digging around in the repository, I've found that a long, long time ago, gem5 was only a binary, and the python code was embedded into it by tacking a zip archive onto the end where gem5 could find it. Apparently this doesn't work for libraries, and so the python code needed to be bundled up and more formally injected into the binary as data. Today done by turning the python code into a binary representation, compressing it into a zip file, and then building the resulting file into a byte array declared in c++. There are, to the first order, two main ways to serialize python "stuff" into a string/byte array, pickling or marshalling. Both have their advantages and disadvantages, partially covered here: https://docs.python.org/3/library/pickle.html#module-pickle The main deal breaker disadvantage of the pickle module is that it does *not* handle code, and so all the code inside our python modules would be lost, leaving only the data. The main disadvantage of marshal is that there is *no* guarantee that it will be compatible between different versions of python, since it's intended as a mostly internal representation, for, for instance ".pyc" files. To solve this problem, Andreas added a "marshal" binary which is written in c++, and includes the python interpreter like gem5 itself would. It runs a small embedded script to use the appropriate version of the marshal module to serialize the code we want. Since the code runs under the same version of the interpreter as gem5 itself would use, when gem5 tries to use the data it will be compatible. **Collection of SimObject information** In order to figure out what SimObjects there are, we must actually import all the *.py files which contain them that have been declared to the build system. Once they've all been imported, the SimObject base class will have collected a list of all of the new subclasses which can then be processed by subsequent steps, generating params structs, etc. To do that requires a couple pieces of fairly heavy infrastructure within the build system. First, we must, even before any of gem5's c++ components have been built, build enough python infrastructure so that SimObjects can work more or less like normal and import what they need, etc. To support that, we have a custom and complex custom import handler in the SConscript which pipes around the .py SimObject modules we know about, with a little extra to support m5.defines.buildEnv. Even with this complex infrastructure in place, this only *mostly* covers what SimObjects might try to use. If someone writes a SimObject and puts code in it to import _m5.foo blindly, that *will* blow up, since _m5 is for c++ modules which don't exist and so can't possibly be provided. Second, and perhaps more subtly since we assume certain capabilities in our build system, this means our build system *must* be able to execute python code, because it *must* be able to import these SimObject .py files to know what to build and how to build it. This will make it impossible to ever use a build system which does not allow the execution of arbitrary python as part of it's build flow, without, for instance, some sort of elaborate harness which escapes to an external python interpreter and then passes back complex data from doing the import itself. **--without-python interaction with the gem5 library and binary** There is a fairly fundamental conflict between the gem5 library and the gem5 binary as far as the --without-python flag. At one time, even when the --without-python flag was provided, gem5 would look for the python libraries for the embedded interpreter anyway, and fail if they were absent. This was changed, implying that --without-python means you don't want to use embedded python at all (different from script python, which you need to even run SCons to begin with). Unfortunately, it's currently impossible to build the gem5 binary without python, since it's main function runs as python and it has to run config scripts. That means if you run this command: scons --without-python build/X86/gem5.opt it will fail with an obscure build error and not tell you that that combination is not allowed. When talking specifically about the library, --without-python can make sense, because you're delegating the main function to somebody else, and there is a c++ config *loading* mechanism. Note that I say a c++ config *loading* mechanism, since this mechanism does not create a config from scratch, it simply loads an ini file created earlier in a bootstrapping step. This implies that at least at some point, you were able to build and run a version of gem5 which *did* have the interpreter in it, even if that is not on the current machine. **Incompatibility between script and binary python for SimObjects** Since we have two different pythons, one for scripts and one embedded, it is entirely possible that one version supports some feature which the other doesn't. Even though we require version 3.6 or greater of python, new features are added all the time and may, for instance, have been introduced in version 3.7 or 3.8. It is *not* guaranteed that when importing SimObjects into SCons as part of the build process that we'll get even the same SimObjects period, let alone equivalent definitions of those objects. For example, you could hypothetically check if a certain feature of python was available, and if yes define SimObject A, and if no define SimObject B. **Suggested change** What I think might help solve this problem is to *require* there to be an embeddable python interpreter no matter what, whether you're going to build it into the library, or build the gem5 binary, or not. Then, we would expand the "marshal" program so that instead of being custom made to marshal our python blobs, it would run arbitrary python code using the built in interpreter like gem5 would. We could then use the same import handler as real gem5, removing one extra copy. We could also teach it to import the blobs we've already made with earlier invocations using the wrapper itself, or put code in the script we run inside it which does the importing. Finally, we would require *all* code which is targeted at running inside gem5 to be run by this interpreter wrapper. The wrapper would be able to marshal python modules just like it does today, although with the small script inside it provided from outside. It would also be able to import all the SimObjects as blobs and report what SimObjects exist, and then it could output the C++ code which defines the Param structs, etc. **Drawback** The biggest but only significant drawback I see to this approach is that this will mean having python libraries around for embedded python will no longer be optional, at least during the build process. **Benefits** Guaranteed compatibility for built python code. Simplified build system (not running arbitrary target python in SCons itself, more rules oriented/less sequentially scripted build process). Reduced dependence on running arbitrary python as part of the build process. Gabe
_______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s