Hi folks, particularly Bobby who made a change related to this fairly
recently. Please offer some feedback, or I'll be forced to assume your
silence is tacit agreement.

Gabe

On Wed, Jul 28, 2021 at 8:30 PM Gabe Black <gabe.bl...@gmail.com> wrote:

> Bump...
>
> On Sat, Jul 24, 2021 at 1:16 AM Gabe Black <gabe.bl...@gmail.com> wrote:
>
>> Hi folks.
>>
>> I think often when we think of how python is used by gem5, many people
>> (myself included) tend to think of it as a monolithic entity, or in other
>> words that there is just *the* python that gets used by everything. The
>> real situation is a little more complicated than that.
>>
>> On the one hand, we have "the" version of the python interpreter itself,
>> which is used to directly run python scripts. This is again an
>> approximation because we could, for instance, have one version of python
>> running SCons itself, and another version running scripts gem5 provides,
>> but for simplicity let's assume that's not the case, or at least that it
>> doesn't matter.
>>
>> Then, we have the python libraries which get built into gem5 and provide
>> it's embedded interpreter. This may not exist at all on the target system,
>> or it could exist but be a different version than the one which you'd get
>> by running the command line "python foo.py".
>>
>>
>> **Marshal binary**
>>
>> The first place this has caused complication is where gem5's python code
>> is packaged up by the build system so that it can be embedded into gem5 in
>> either library form or executable form. Through digging around in the
>> repository, I've found that a long, long time ago, gem5 was only a binary,
>> and the python code was embedded into it by tacking a zip archive onto the
>> end where gem5 could find it. Apparently this doesn't work for libraries,
>> and so the python code needed to be bundled up and more formally injected
>> into the binary as data. Today done by turning the python code into a
>> binary representation, compressing it into a zip file, and then building
>> the resulting file into a byte array declared in c++.
>>
>> There are, to the first order, two main ways to serialize python "stuff"
>> into a string/byte array, pickling or marshalling. Both have their
>> advantages and disadvantages, partially covered here:
>>
>> https://docs.python.org/3/library/pickle.html#module-pickle
>>
>> The main deal breaker disadvantage of the pickle module is that it does
>> *not* handle code, and so all the code inside our python modules would be
>> lost, leaving only the data. The main disadvantage of marshal is that there
>> is *no* guarantee that it will be compatible between different versions of
>> python, since it's intended as a mostly internal representation, for, for
>> instance ".pyc" files.
>>
>> To solve this problem, Andreas added a "marshal" binary which is written
>> in c++, and includes the python interpreter like gem5 itself would. It runs
>> a small embedded script to use the appropriate version of the marshal
>> module to serialize the code we want. Since the code runs under the same
>> version of the interpreter as gem5 itself would use, when gem5 tries to use
>> the data it will be compatible.
>>
>>
>> **Collection of SimObject information**
>>
>> In order to figure out what SimObjects there are, we must actually import
>> all the *.py files which contain them that have been declared to the build
>> system. Once they've all been imported, the SimObject base class will have
>> collected a list of all of the new subclasses which can then be processed
>> by subsequent steps, generating params structs, etc.
>>
>> To do that requires a couple pieces of fairly heavy infrastructure within
>> the build system. First, we must, even before any of gem5's c++ components
>> have been built, build enough python infrastructure so that SimObjects can
>> work more or less like normal and import what they need, etc. To support
>> that, we have a custom and complex custom import handler in the SConscript
>> which pipes around the .py SimObject modules we know about, with a little
>> extra to support m5.defines.buildEnv. Even with this complex infrastructure
>> in place, this only *mostly* covers what SimObjects might try to use. If
>> someone writes a SimObject and puts code in it to import _m5.foo blindly,
>> that *will* blow up, since _m5 is for c++ modules which don't exist and so
>> can't possibly be provided.
>>
>> Second, and perhaps more subtly since we assume certain capabilities in
>> our build system, this means our build system *must* be able to execute
>> python code, because it *must* be able to import these SimObject .py files
>> to know what to build and how to build it. This will make it impossible to
>> ever use a build system which does not allow the execution of arbitrary
>> python as part of it's build flow, without, for instance, some sort of
>> elaborate harness which escapes to an external python interpreter and then
>> passes back complex data from doing the import itself.
>>
>>
>> **--without-python interaction with the gem5 library and binary**
>>
>> There is a fairly fundamental conflict between the gem5 library and the
>> gem5 binary as far as the --without-python flag. At one time, even when the
>> --without-python flag was provided, gem5 would look for the python
>> libraries for the embedded interpreter anyway, and fail if they were
>> absent. This was changed, implying that --without-python means you don't
>> want to use embedded python at all (different from script python, which you
>> need to even run SCons to begin with). Unfortunately, it's currently
>> impossible to build the gem5 binary without python, since it's main
>> function runs as python and it has to run config scripts. That means if you
>> run this command:
>>
>> scons --without-python build/X86/gem5.opt
>>
>> it will fail with an obscure build error and not tell you that that
>> combination is not allowed.
>>
>> When talking specifically about the library, --without-python can make
>> sense, because you're delegating the main function to somebody else, and
>> there is a c++ config *loading* mechanism.
>>
>> Note that I say a c++ config *loading* mechanism, since this mechanism
>> does not create a config from scratch, it simply loads an ini file created
>> earlier in a bootstrapping step. This implies that at least at some point,
>> you were able to build and run a version of gem5 which *did* have the
>> interpreter in it, even if that is not on the current machine.
>>
>>
>> **Incompatibility between script and binary python for SimObjects**
>>
>> Since we have two different pythons, one for scripts and one embedded, it
>> is entirely possible that one version supports some feature which the other
>> doesn't. Even though we require version 3.6 or greater of python, new
>> features are added all the time and may, for instance, have been introduced
>> in version 3.7 or 3.8. It is *not* guaranteed that when importing
>> SimObjects into SCons as part of the build process that we'll get even the
>> same SimObjects period, let alone equivalent definitions of those objects.
>>
>> For example, you could hypothetically check if a certain feature of
>> python was available, and if yes define SimObject A, and if no define
>> SimObject B.
>>
>>
>> **Suggested change**
>>
>> What I think might help solve this problem is to *require* there to be an
>> embeddable python interpreter no matter what, whether you're going to build
>> it into the library, or build the gem5 binary, or not.
>>
>> Then, we would expand the "marshal" program so that instead of being
>> custom made to marshal our python blobs, it would run arbitrary python code
>> using the built in interpreter like gem5 would. We could then use the same
>> import handler as real gem5, removing one extra copy. We could also teach
>> it to import the blobs we've already made with earlier invocations using
>> the wrapper itself, or put code in the script we run inside it which does
>> the importing.
>>
>> Finally, we would require *all* code which is targeted at running inside
>> gem5 to be run by this interpreter wrapper. The wrapper would be able to
>> marshal python modules just like it does today, although with the small
>> script inside it provided from outside. It would also be able to import all
>> the SimObjects as blobs and report what SimObjects exist, and then it could
>> output the C++ code which defines the Param structs, etc.
>>
>>
>> **Drawback**
>>
>> The biggest but only significant drawback I see to this approach is that
>> this will mean having python libraries around for embedded python will no
>> longer be optional, at least during the build process.
>>
>>
>> **Benefits**
>>
>> Guaranteed compatibility for built python code. Simplified build system
>> (not running arbitrary target python in SCons itself, more rules
>> oriented/less sequentially scripted build process). Reduced dependence on
>> running arbitrary python as part of the build process.
>>
>> Gabe
>>
>
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to