Hi folks, particularly Bobby who made a change related to this fairly recently. Please offer some feedback, or I'll be forced to assume your silence is tacit agreement.
Gabe On Wed, Jul 28, 2021 at 8:30 PM Gabe Black <gabe.bl...@gmail.com> wrote: > Bump... > > On Sat, Jul 24, 2021 at 1:16 AM Gabe Black <gabe.bl...@gmail.com> wrote: > >> Hi folks. >> >> I think often when we think of how python is used by gem5, many people >> (myself included) tend to think of it as a monolithic entity, or in other >> words that there is just *the* python that gets used by everything. The >> real situation is a little more complicated than that. >> >> On the one hand, we have "the" version of the python interpreter itself, >> which is used to directly run python scripts. This is again an >> approximation because we could, for instance, have one version of python >> running SCons itself, and another version running scripts gem5 provides, >> but for simplicity let's assume that's not the case, or at least that it >> doesn't matter. >> >> Then, we have the python libraries which get built into gem5 and provide >> it's embedded interpreter. This may not exist at all on the target system, >> or it could exist but be a different version than the one which you'd get >> by running the command line "python foo.py". >> >> >> **Marshal binary** >> >> The first place this has caused complication is where gem5's python code >> is packaged up by the build system so that it can be embedded into gem5 in >> either library form or executable form. Through digging around in the >> repository, I've found that a long, long time ago, gem5 was only a binary, >> and the python code was embedded into it by tacking a zip archive onto the >> end where gem5 could find it. Apparently this doesn't work for libraries, >> and so the python code needed to be bundled up and more formally injected >> into the binary as data. Today done by turning the python code into a >> binary representation, compressing it into a zip file, and then building >> the resulting file into a byte array declared in c++. >> >> There are, to the first order, two main ways to serialize python "stuff" >> into a string/byte array, pickling or marshalling. Both have their >> advantages and disadvantages, partially covered here: >> >> https://docs.python.org/3/library/pickle.html#module-pickle >> >> The main deal breaker disadvantage of the pickle module is that it does >> *not* handle code, and so all the code inside our python modules would be >> lost, leaving only the data. The main disadvantage of marshal is that there >> is *no* guarantee that it will be compatible between different versions of >> python, since it's intended as a mostly internal representation, for, for >> instance ".pyc" files. >> >> To solve this problem, Andreas added a "marshal" binary which is written >> in c++, and includes the python interpreter like gem5 itself would. It runs >> a small embedded script to use the appropriate version of the marshal >> module to serialize the code we want. Since the code runs under the same >> version of the interpreter as gem5 itself would use, when gem5 tries to use >> the data it will be compatible. >> >> >> **Collection of SimObject information** >> >> In order to figure out what SimObjects there are, we must actually import >> all the *.py files which contain them that have been declared to the build >> system. Once they've all been imported, the SimObject base class will have >> collected a list of all of the new subclasses which can then be processed >> by subsequent steps, generating params structs, etc. >> >> To do that requires a couple pieces of fairly heavy infrastructure within >> the build system. First, we must, even before any of gem5's c++ components >> have been built, build enough python infrastructure so that SimObjects can >> work more or less like normal and import what they need, etc. To support >> that, we have a custom and complex custom import handler in the SConscript >> which pipes around the .py SimObject modules we know about, with a little >> extra to support m5.defines.buildEnv. Even with this complex infrastructure >> in place, this only *mostly* covers what SimObjects might try to use. If >> someone writes a SimObject and puts code in it to import _m5.foo blindly, >> that *will* blow up, since _m5 is for c++ modules which don't exist and so >> can't possibly be provided. >> >> Second, and perhaps more subtly since we assume certain capabilities in >> our build system, this means our build system *must* be able to execute >> python code, because it *must* be able to import these SimObject .py files >> to know what to build and how to build it. This will make it impossible to >> ever use a build system which does not allow the execution of arbitrary >> python as part of it's build flow, without, for instance, some sort of >> elaborate harness which escapes to an external python interpreter and then >> passes back complex data from doing the import itself. >> >> >> **--without-python interaction with the gem5 library and binary** >> >> There is a fairly fundamental conflict between the gem5 library and the >> gem5 binary as far as the --without-python flag. At one time, even when the >> --without-python flag was provided, gem5 would look for the python >> libraries for the embedded interpreter anyway, and fail if they were >> absent. This was changed, implying that --without-python means you don't >> want to use embedded python at all (different from script python, which you >> need to even run SCons to begin with). Unfortunately, it's currently >> impossible to build the gem5 binary without python, since it's main >> function runs as python and it has to run config scripts. That means if you >> run this command: >> >> scons --without-python build/X86/gem5.opt >> >> it will fail with an obscure build error and not tell you that that >> combination is not allowed. >> >> When talking specifically about the library, --without-python can make >> sense, because you're delegating the main function to somebody else, and >> there is a c++ config *loading* mechanism. >> >> Note that I say a c++ config *loading* mechanism, since this mechanism >> does not create a config from scratch, it simply loads an ini file created >> earlier in a bootstrapping step. This implies that at least at some point, >> you were able to build and run a version of gem5 which *did* have the >> interpreter in it, even if that is not on the current machine. >> >> >> **Incompatibility between script and binary python for SimObjects** >> >> Since we have two different pythons, one for scripts and one embedded, it >> is entirely possible that one version supports some feature which the other >> doesn't. Even though we require version 3.6 or greater of python, new >> features are added all the time and may, for instance, have been introduced >> in version 3.7 or 3.8. It is *not* guaranteed that when importing >> SimObjects into SCons as part of the build process that we'll get even the >> same SimObjects period, let alone equivalent definitions of those objects. >> >> For example, you could hypothetically check if a certain feature of >> python was available, and if yes define SimObject A, and if no define >> SimObject B. >> >> >> **Suggested change** >> >> What I think might help solve this problem is to *require* there to be an >> embeddable python interpreter no matter what, whether you're going to build >> it into the library, or build the gem5 binary, or not. >> >> Then, we would expand the "marshal" program so that instead of being >> custom made to marshal our python blobs, it would run arbitrary python code >> using the built in interpreter like gem5 would. We could then use the same >> import handler as real gem5, removing one extra copy. We could also teach >> it to import the blobs we've already made with earlier invocations using >> the wrapper itself, or put code in the script we run inside it which does >> the importing. >> >> Finally, we would require *all* code which is targeted at running inside >> gem5 to be run by this interpreter wrapper. The wrapper would be able to >> marshal python modules just like it does today, although with the small >> script inside it provided from outside. It would also be able to import all >> the SimObjects as blobs and report what SimObjects exist, and then it could >> output the C++ code which defines the Param structs, etc. >> >> >> **Drawback** >> >> The biggest but only significant drawback I see to this approach is that >> this will mean having python libraries around for embedded python will no >> longer be optional, at least during the build process. >> >> >> **Benefits** >> >> Guaranteed compatibility for built python code. Simplified build system >> (not running arbitrary target python in SCons itself, more rules >> oriented/less sequentially scripted build process). Reduced dependence on >> running arbitrary python as part of the build process. >> >> Gabe >> >
_______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s