Re: [lucy-dev] Hosts, refcounts and destructors

Marvin Humphrey Tue, 03 Feb 2015 19:16:18 -0800

On Tue, Feb 3, 2015 at 4:57 AM, Nick Wellnhofer <[email protected]> wrote:
> On 03/02/2015 03:33, Marvin Humphrey wrote:
>>
>> Second, under Python, Clownfish objects use the Python refcount -- which
>> Python of course accesses directly, rather than going through the
>> Clownfish Inc_RefCount, Dec_RefCount and Get_RefCount wrappers.  This
>> presents problems for immortal classes like Class, Method, and BoolNum
>> which override those refcount manipulation methods to do nothing.  A
>> incref from Clownfish-space followed by a decref from Python-space will
>> cause the refcount to decrease, eventually triggering an immortal
>> object's destructor while valid references still exist.
>
>
> In the Perl bindings, the work-around is to intercept the destructor. I
> guess that's not possible in Python.


Actually, it is possible.

Python supports a hook for the destructor (`tp_dealloc`, which corresponds to
Clownfish's `Destroy` and is called by Py_DECREF and friends when the refcount
reaches 0):

    https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_dealloc

It also supports a separate hook for the code which frees the object
allocation (`tp_free`):

    https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_free

The complementary functions from object *creation* time are `tp_alloc`,
`tp_new`, and `tp_init`.

    http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence/

    `tp_alloc` is a low-level slot of the type object in CPython. It's not
    directly accessible from Python code, but should be familiar to C
    extension developers. A custom type defined in a C extension may override
    this slot to supply a custom memory allocation scheme for instances of
    itself...

So, we could theoretically supply versions of `tp_alloc` and `tp_free` which
call `malloc` and `free` internally instead of using Python's allocator.
I have not taken this path because the default implementation of `tp_alloc`
does a bunch of other stuff under debugging builds which I don't want to
duplicate.

However, what I *am* proposing is to install a custom `tp_dealloc`
for the "immortal" classes which resets the refcount and stops there.

If there's a reason that won't work, I haven't discovered it yet.

> (Is our Perl implementation even safe?
> Isn't it possible that Perl reuses the cached "inner object" SV if the Perl
> refcount drops to zero?)

We should be safe because I don't think that's possible.  Perl never gets at
the inner object directly: we give Perl a new RV every time a Clownfish object
needs to be accessed from Perl-space.  Creating a new RV causes the inner
object's refcount to rise by 1; when the RV ultimately goes away, it causes
the inner object's refcount to fall by 1, resulting in no net change.  And all
these refcount manipulations are thread-safe because the we `SvSHARE` the
inner object for "immortal" Clownfish types.

In the current implementation, intercepting DESTROY should only come into play
during Perl's global destruction.

But maybe to be safer, immortal classes should have DESTROY implementations
which clear the cached host object, instead of the no-op DESTROY
implementations they have now.

>> To address this issue, I think what we ought to do is take refcount
>> manipulation outside of method calls, instead manipulating refcounts
>> directly using macros which wrap either concrete or `static inline`
>> functions.  Immortal types can have their refcount set absurdly high on
>> object creation, and can have their destructors reset the refcount
>> rather than invoke deallocation.
>
> Some of this was already proposed when discussing the "immortal" flag. I
> also have a half-finished branch lying around somewhere.

Here's the thread and the branch for reference:

    http://markmail.org/message/e7h3kgq62c24zeqe
    
http://mail-archives.apache.org/mod_mbox/lucy-dev/201408.mbox/%3C53DF557D.9030705%40aevum.de%3E
    
https://git1-us-west.apache.org/repos/asf?p=lucy-clownfish.git;a=shortlog;h=refs/heads/clone_class_registry

>> Under that system, refcounts will be exceedingly unlikely to fall to 0
>> under normal circumstances.  However, we still have to take care not to
>> introduce a security vulnerability which happens when a destructor fires
>> on an immortal object in case someone figures out how to make that
>> happen artificially -- so I'd like to know if anybody sees problems now,
>> and I'd like to request that any commits I eventually supply receive
>> above-average scrutiny.
>
> I don't really like the "absurdly high refcount" idea for exactly that
> reason. The actual problem with "immortal" objects is that they're shared
> across threads and we want to avoid concurrent refcount manipulation. Maybe
> this isn't an issue in the Python bindings due to the global interpreter
> lock?

Yes.  In Python, a C extension must explicitly *release* the GIL in order to
enable concurrency.

    
https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock

Marvin Humphrey

Re: [lucy-dev] Hosts, refcounts and destructors

Reply via email to