Bump...

On Sat, Jul 24, 2021 at 1:16 AM Gabe Black <gabe.bl...@gmail.com> wrote:

> Hi folks.
>
> I think often when we think of how python is used by gem5, many people
> (myself included) tend to think of it as a monolithic entity, or in other
> words that there is just *the* python that gets used by everything. The
> real situation is a little more complicated than that.
>
> On the one hand, we have "the" version of the python interpreter itself,
> which is used to directly run python scripts. This is again an
> approximation because we could, for instance, have one version of python
> running SCons itself, and another version running scripts gem5 provides,
> but for simplicity let's assume that's not the case, or at least that it
> doesn't matter.
>
> Then, we have the python libraries which get built into gem5 and provide
> it's embedded interpreter. This may not exist at all on the target system,
> or it could exist but be a different version than the one which you'd get
> by running the command line "python foo.py".
>
>
> **Marshal binary**
>
> The first place this has caused complication is where gem5's python code
> is packaged up by the build system so that it can be embedded into gem5 in
> either library form or executable form. Through digging around in the
> repository, I've found that a long, long time ago, gem5 was only a binary,
> and the python code was embedded into it by tacking a zip archive onto the
> end where gem5 could find it. Apparently this doesn't work for libraries,
> and so the python code needed to be bundled up and more formally injected
> into the binary as data. Today done by turning the python code into a
> binary representation, compressing it into a zip file, and then building
> the resulting file into a byte array declared in c++.
>
> There are, to the first order, two main ways to serialize python "stuff"
> into a string/byte array, pickling or marshalling. Both have their
> advantages and disadvantages, partially covered here:
>
> https://docs.python.org/3/library/pickle.html#module-pickle
>
> The main deal breaker disadvantage of the pickle module is that it does
> *not* handle code, and so all the code inside our python modules would be
> lost, leaving only the data. The main disadvantage of marshal is that there
> is *no* guarantee that it will be compatible between different versions of
> python, since it's intended as a mostly internal representation, for, for
> instance ".pyc" files.
>
> To solve this problem, Andreas added a "marshal" binary which is written
> in c++, and includes the python interpreter like gem5 itself would. It runs
> a small embedded script to use the appropriate version of the marshal
> module to serialize the code we want. Since the code runs under the same
> version of the interpreter as gem5 itself would use, when gem5 tries to use
> the data it will be compatible.
>
>
> **Collection of SimObject information**
>
> In order to figure out what SimObjects there are, we must actually import
> all the *.py files which contain them that have been declared to the build
> system. Once they've all been imported, the SimObject base class will have
> collected a list of all of the new subclasses which can then be processed
> by subsequent steps, generating params structs, etc.
>
> To do that requires a couple pieces of fairly heavy infrastructure within
> the build system. First, we must, even before any of gem5's c++ components
> have been built, build enough python infrastructure so that SimObjects can
> work more or less like normal and import what they need, etc. To support
> that, we have a custom and complex custom import handler in the SConscript
> which pipes around the .py SimObject modules we know about, with a little
> extra to support m5.defines.buildEnv. Even with this complex infrastructure
> in place, this only *mostly* covers what SimObjects might try to use. If
> someone writes a SimObject and puts code in it to import _m5.foo blindly,
> that *will* blow up, since _m5 is for c++ modules which don't exist and so
> can't possibly be provided.
>
> Second, and perhaps more subtly since we assume certain capabilities in
> our build system, this means our build system *must* be able to execute
> python code, because it *must* be able to import these SimObject .py files
> to know what to build and how to build it. This will make it impossible to
> ever use a build system which does not allow the execution of arbitrary
> python as part of it's build flow, without, for instance, some sort of
> elaborate harness which escapes to an external python interpreter and then
> passes back complex data from doing the import itself.
>
>
> **--without-python interaction with the gem5 library and binary**
>
> There is a fairly fundamental conflict between the gem5 library and the
> gem5 binary as far as the --without-python flag. At one time, even when the
> --without-python flag was provided, gem5 would look for the python
> libraries for the embedded interpreter anyway, and fail if they were
> absent. This was changed, implying that --without-python means you don't
> want to use embedded python at all (different from script python, which you
> need to even run SCons to begin with). Unfortunately, it's currently
> impossible to build the gem5 binary without python, since it's main
> function runs as python and it has to run config scripts. That means if you
> run this command:
>
> scons --without-python build/X86/gem5.opt
>
> it will fail with an obscure build error and not tell you that that
> combination is not allowed.
>
> When talking specifically about the library, --without-python can make
> sense, because you're delegating the main function to somebody else, and
> there is a c++ config *loading* mechanism.
>
> Note that I say a c++ config *loading* mechanism, since this mechanism
> does not create a config from scratch, it simply loads an ini file created
> earlier in a bootstrapping step. This implies that at least at some point,
> you were able to build and run a version of gem5 which *did* have the
> interpreter in it, even if that is not on the current machine.
>
>
> **Incompatibility between script and binary python for SimObjects**
>
> Since we have two different pythons, one for scripts and one embedded, it
> is entirely possible that one version supports some feature which the other
> doesn't. Even though we require version 3.6 or greater of python, new
> features are added all the time and may, for instance, have been introduced
> in version 3.7 or 3.8. It is *not* guaranteed that when importing
> SimObjects into SCons as part of the build process that we'll get even the
> same SimObjects period, let alone equivalent definitions of those objects.
>
> For example, you could hypothetically check if a certain feature of python
> was available, and if yes define SimObject A, and if no define SimObject B.
>
>
> **Suggested change**
>
> What I think might help solve this problem is to *require* there to be an
> embeddable python interpreter no matter what, whether you're going to build
> it into the library, or build the gem5 binary, or not.
>
> Then, we would expand the "marshal" program so that instead of being
> custom made to marshal our python blobs, it would run arbitrary python code
> using the built in interpreter like gem5 would. We could then use the same
> import handler as real gem5, removing one extra copy. We could also teach
> it to import the blobs we've already made with earlier invocations using
> the wrapper itself, or put code in the script we run inside it which does
> the importing.
>
> Finally, we would require *all* code which is targeted at running inside
> gem5 to be run by this interpreter wrapper. The wrapper would be able to
> marshal python modules just like it does today, although with the small
> script inside it provided from outside. It would also be able to import all
> the SimObjects as blobs and report what SimObjects exist, and then it could
> output the C++ code which defines the Param structs, etc.
>
>
> **Drawback**
>
> The biggest but only significant drawback I see to this approach is that
> this will mean having python libraries around for embedded python will no
> longer be optional, at least during the build process.
>
>
> **Benefits**
>
> Guaranteed compatibility for built python code. Simplified build system
> (not running arbitrary target python in SCons itself, more rules
> oriented/less sequentially scripted build process). Reduced dependence on
> running arbitrary python as part of the build process.
>
> Gabe
>
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to