FFI::AutoPointer occasionally calls releaser proc when GC'd, even if 
autorelease is set to false
------------------------------------------------------------------------------------------------

                 Key: JRUBY-5813
                 URL: http://jira.codehaus.org/browse/JRUBY-5813
             Project: JRuby
          Issue Type: Bug
          Components: Extensions
    Affects Versions: JRuby 1.6.1
         Environment: Snow Leopard 10.6.7 with Java 1.6.0_24-b07-334
            Reporter: Daniel Azuma


The summary is that FFI::AutoPointer will occasionally call the release proc 
even after autorelease=false has been executed. When it happens, this can cause 
serious issues including segfaults for clients that rely on AutoPointer to 
direct their memory management. This is due to an assumption made by 
AutoPointer about the operation of the garbage collector, and it seems that 
assumption is not always true.

I found this while working with the ffi-geos gem 
(https://github.com/dark-panda/ffi-geos), which exercises AutoPointer quite a 
bit.

FFI::AutoPointer provides an "autorelease" mechanism by which a proc can be 
called when the AutoPointer is garbage collected. (ffi-geos uses this mechanism 
to release the memory held by the pointer.) AutoPointer also provides a 
mechanism whereby the "autorelease" can be enabled and disabled. (ffi-geos uses 
this mechanism because it has to handle certain use cases where another object 
"takes ownership" of the pointer and so the AutoPointer is no longer 
responsible for releasing the memory. That is, it sometimes calls 
AutoPointer#autorelease=false, and expects that, if the AutoPointer becomes 
unreachable after that point, the release proc will *not* be called.)

AutoPointer implements this by using a PhantomReference called the "Reaper". 
When the AutoPointer becomes phantom-reachable, the Reaper, if present and 
reachable, runs the release proc. The autorelease enable/disable switch is 
implemented by adding/removing the Reaper to a global collection. When the 
Reaper is in the global collection, it is strongly referenced and thus queued 
for execution when the AutoPointer becomes phantom-reachable. When the Reaper 
is not in the global collection, it becomes unreachable along with its 
associated AutoPointer and therefore is not queued for execution.

At least, this seems to have been the intent. In actuality, I have observed 
rare but definite instances when the Reaper gets executed anyway even though it 
had (long before) been removed from the global collection. This manifests as a 
spurious call to release proc even though autorelease had been set to false. In 
the case of ffi-geos, this results in a segmentation fault because the 
associated memory gets released twice.

My belief is that this occurs because we cannot fully predict (and it is not 
specified) in which order the garbage collector computes object reachability 
and queues references. AutoPointer is depending on the Reaper being marked as 
unreachable *before* its corresponding AutoPointer becomes phantom-reachable 
and populates its phantom reference queue. This seems to be true most but not 
all the time.

I've provided a small patch that appears to fix the issue:
https://github.com/dazuma/jruby/commit/19f0dcde0570e21e4c7772208a5489093e5502f3

It basically puts an additional guard around the running of the release proc. 
In the case where autorelease had been set to false (e.g. the Reaper was not in 
the global collection), but the Reaper ran anyway, it checks to make sure the 
Reaper actually was present in the global collection before running the release 
proc. Because the Reaper already attempts to remove itself from the global 
collection when it is executed, that check was already being computed, and so 
the overhead added by actually using its result is vanishingly small.

This is a highly intermittent issue so I have no test case to submit. I can 
reproduce this a fraction of the time (maybe about 20-30% of the time) by 
running the test suite for the rgeo gem (the github head at 
https://github.com/dazuma/rgeo) in the presence of the ffi-geos gem 
(0.0.1.beta3). I am confident that the above patch fixes the issue because I 
have run the test several dozen times with the patch and have not observed any 
failures, whereas, without the patch, I can get the test suite to segfault 
around 20-30% of the time.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to