Hey Vitaly,

I'm not sure what the best solution here is. Ideally, we want both pycapnp
and nupic to always link to the exact same compiled version of C++ Cap'n
Proto.

Perhaps it should be pycapnp's responsibility to always bundle a compiled
distribution of C++ Cap'n Proto, complete with headers. Then in nupic's
build process, it could call some pycapnp function that supplies said
directories (similar to how numpy.get_include() works). Does that sound
reasonable to you?

On Thu, Jul 28, 2016 at 6:17 PM, vitaly numenta <
[email protected]> wrote:

> We see errors converting pycapnp builders to C++ capnp builders on Ubuntu
> 16.04
> when using our Python extensions compiled under the "manylinux" environment
> (Centos-6.8 with gcc 4.8.2)
>
> We pass a pycapnp builder to C++ using this code:
> https://github.com/numenta/nupic.core/blob/064f8b1ef003d5ee07405cd5ac4158
> 3f83ab1d35/src/nupic/py_support/PyCapnp.hpp#L71
>
> When we cast the schema parser to `pycapnp_SchemaParser*` and deref the
> `thisptr` attribute, the values appear bogus, suggesting an incorrect cast.
>
> Pycapnp was installed on Ubuntu 16.04 and builds the extensions and
> capnproto
> using gcc 5.4.0.
>
> Is it possible that the SchemaParser or SchemaLoader struct from the
> pycapnp
> extension built with gcc 5.4.0 has different alignment/layout than
> expected by
> the cast in the NuPIC C extension compiled with gcc 4.8.2?
>
>
> More details...
>
> First, an overview of capnp's integration into nupic and nupic.bindings:
> The
> `nupic` pure python package gets capnp via the `pycapnp==0.5.8` package,
> which
> contains its own version of compiled `capnproto` sources.
> `nupic.bindings`, a python
> extension built in `nupic.core`, includes its own version of capnp 0.5.3
> sources
> compiled into the extension's shared libraries, such as `_algorithms.so`,
> `_math.so`, etc. Nupic.bindings contains the C++ implementation of classes
> and supporting
> logic used by nupic.
>
> So, when nupic is used, there are two versions of compiled capnproto in
> play: one
> from pycapnp imported  by nupic, and another built into the nupic.bindings
> extension.
> On the Ubuntu 16.04 system, pycapnp's capnpproto C++ sources were compiled
> via
> gcc/g++ 5.4.0 during installation of pycapnp on that system. The capnproto
> C++
> sources in nupic.bindings were compiled on CentOS-6.8 using gcc/g++ 4.8.2
> during
> the build of the "manylinux" nupic.bindings wheel. Note that those
> toolchains are a
> MAJOR VERSION apart and the two extensions compile the capnproto C++
> sources
> independently using their own sets of compiler/linker flags and options
> (not to
> mention that the two versions of capnp sources, although similar, might
> not be
> identical).
>
> When nupic wants to serialize a nupic.bindings-based object, nupic passes
> the
> python Builder object instantiated by pycapnp to the nupic.bindings python
> extension, whose C++ code extracts the C++ Builder from the python
> Builder. For
> example, in the case of the Random class, nupic.bindings' _math.so
> extracts the
> C++ RandomProto::Builder instance from the python Builder instance at
> https://github.com/numenta/nupic.core/blob/0.4.4/src/
> nupic/bindings/math.i#L374-L375,
> then passes the extracted builder instance to the C++ Random object's
> `write` method for serialization.
>
> So, the nupic.bindings extension's shared libs pass C++ capnp objects
> instantiated by pycapnp's build of capnp to nupic.bindings-based methods
> that
> act on those capnp objects using methods in nupic.bindings' own build of
> capnproto.
> To reiterate, objects instantiated by pycapnp's build of capnproto are
> being
> operated on by methods in nupic.bindings's own build of capnproto code.
>
> This integration happens to work when both pycapnp and nupic.bindings are
> both
> compiled/linked on the same platform. Also, it seems to work when the two
> are
> compiled/linked with nearby versions of toolchains, such as pycapnp being
> built
> on Ubuntu 14.04 with gcc/g++ 4.8.4 and nupic.bindings being built on
> CentOS-6.8
> with gcc/g++ 4.8.2.
>
> However, the integration misbehaves when installed on Ubuntu Server 16.04.
> In
> this case, pycapnp==0.5.8 is built (as the result of installation from
> PyPi) on
> Ubuntu 16.04 by gcc/g++ 5.4.0, but the manylinux nupic.bindings wheel was
> built
> on CentOS-6.8 using gcc/g++ 4.8.2. The detailed root-cause analysis is in
> https://github.com/numenta/nupic.core/issues/1013#issuecomment-235736477
> (look
> for "ROOT-CAUSE ANALYSIS" in that github issue). The short version of it
> is:
>
> 1. nupic.bindings extracts the C++ capnp Builder object from python
> Builder that
> was instantiated by the pycapnp python extension. nupic.bindings uses this
> function that's linked into _math.so to extract the C++ Builder object:
>
> ```
> template<class T> typename T::Builder getBuilder(PyObject* pyBuilder)
> {
>     PyObject* capnpModule = PyImport_AddModule("capnp.lib.capnp");
> PyObject*
>     pySchemaParser = PyObject_GetAttrString(capnpModule,
> "_global_schema_parser");
>
>     pycapnp_SchemaParser* schemaParser = (pycapnp_SchemaParser*)
> pySchemaParser;
>     schemaParser->thisptr->loadCompiledTypeAndDependencies<T>();
>
>     pycapnp_DynamicStructBuilder* dynamicStruct =
> (pycapnp_DynamicStructBuilder*)pyBuilder;
>     capnp::DynamicStruct::Builder& builder = dynamicStruct->thisptr;
>     typename T::Builder proto = builder.as<T>();
>     return proto;
> }
> ```
>
> 2. The statement `schemaParser->thisptr->loadCompiledTypeAndDependencie
> s<T>()`
> invokes `capnp::SchemaParser::loadCompiledTypeAndDependencies()` method on
> `thisptr`, which is a pointer to the {{capnp::SchemaParser}} instance
> instantiated by pycapnp's capnp code.
>
> 3. However, because `nupic::getBuilder<RandomProto>` is compiled into
> nupic.bindings' python extension that includes its own version of capnp
>  (in
> _math.so, in this case), the call to
> `capnp::SchemaParser::loadCompiledTypeAndDependencies<T>()` resolved to
> capnp in
> _math.so, instead of the capnp code in pycapnp build that instantiated this
> `capnp::SchemaParser` object.
>
> 4. This is where things get hairy: when we use gdb to examine the contents
> of
> the `capnp::SchemaLoader` referenced by the extracted `capnp::SchemaParser`
> (that was instantiated by pycapnp's capnp code) at the point where
> `capnp::SchemaLoader::loadNative` is called inside the nupic.bindings's
> own
> build of capnp, we observe that the instance member contents don't make any
> sense. There is apparently some mismatch taking place between the
> capnp::SchemaLoader object instantiated by pycapnp's capnp code (built
> with g++
> 5.4.0) and the corresponding capnp::SchemaLoader class in the manylinux
> nupic.bindings wheel (built with g++ 4.8.2):
>
> ```
> (gdb) p this
> $17 = (capnp::SchemaLoader * const) 0x103cba0 (gdb) p *this $18
> = {impl = {mutex = {futex = 4031237736, static EXCLUSIVE_HELD = 2147483648
> ,
> static EXCLUSIVE_REQUESTED = 1073741824, static SHARED_COUNT_MASK =
> 1073741823},
> value = { disposer = 0x7ffff07208e8, ptr = 0xfffffffffffffffd}}}
>
> or in hex like this:
>
> (gdb) p/x *this
> $26 = {impl = {mutex = {futex = 0xf047ce68,
> static EXCLUSIVE_HELD = 0x80000000, static EXCLUSIVE_REQUESTED =
> 0x40000000,
> static SHARED_COUNT_MASK = 0x3fffffff}, value = { disposer =
> 0x7ffff07208e8, ptr
> = 0xfffffffffffffffd}}}
> ```
>
> In particular, we note that the instance member `mutex.futex` has an
> invalid
> value 0xf047ce68 (it should have been 0 at this point in the
> single-threaded
> execution); impl.value.ptr also has an invalid value of 0xfffffffffffffffd
> - it
> should have been either null or a valid pointer. Subsequently, when
> kj::Mutex::lock attempts to lock the futex, the system call never returns,
> because of the bogus value in mutex.futex.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> Visit this group at https://groups.google.com/group/capnproto.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to