FFI::AutoPointer occasionally calls releaser proc when GC'd, even if
autorelease is set to false
------------------------------------------------------------------------------------------------
Key: JRUBY-5813
URL: http://jira.codehaus.org/browse/JRUBY-5813
Project: JRuby
Issue Type: Bug
Components: Extensions
Affects Versions: JRuby 1.6.1
Environment: Snow Leopard 10.6.7 with Java 1.6.0_24-b07-334
Reporter: Daniel Azuma
The summary is that FFI::AutoPointer will occasionally call the release proc
even after autorelease=false has been executed. When it happens, this can cause
serious issues including segfaults for clients that rely on AutoPointer to
direct their memory management. This is due to an assumption made by
AutoPointer about the operation of the garbage collector, and it seems that
assumption is not always true.
I found this while working with the ffi-geos gem
(https://github.com/dark-panda/ffi-geos), which exercises AutoPointer quite a
bit.
FFI::AutoPointer provides an "autorelease" mechanism by which a proc can be
called when the AutoPointer is garbage collected. (ffi-geos uses this mechanism
to release the memory held by the pointer.) AutoPointer also provides a
mechanism whereby the "autorelease" can be enabled and disabled. (ffi-geos uses
this mechanism because it has to handle certain use cases where another object
"takes ownership" of the pointer and so the AutoPointer is no longer
responsible for releasing the memory. That is, it sometimes calls
AutoPointer#autorelease=false, and expects that, if the AutoPointer becomes
unreachable after that point, the release proc will *not* be called.)
AutoPointer implements this by using a PhantomReference called the "Reaper".
When the AutoPointer becomes phantom-reachable, the Reaper, if present and
reachable, runs the release proc. The autorelease enable/disable switch is
implemented by adding/removing the Reaper to a global collection. When the
Reaper is in the global collection, it is strongly referenced and thus queued
for execution when the AutoPointer becomes phantom-reachable. When the Reaper
is not in the global collection, it becomes unreachable along with its
associated AutoPointer and therefore is not queued for execution.
At least, this seems to have been the intent. In actuality, I have observed
rare but definite instances when the Reaper gets executed anyway even though it
had (long before) been removed from the global collection. This manifests as a
spurious call to release proc even though autorelease had been set to false. In
the case of ffi-geos, this results in a segmentation fault because the
associated memory gets released twice.
My belief is that this occurs because we cannot fully predict (and it is not
specified) in which order the garbage collector computes object reachability
and queues references. AutoPointer is depending on the Reaper being marked as
unreachable *before* its corresponding AutoPointer becomes phantom-reachable
and populates its phantom reference queue. This seems to be true most but not
all the time.
I've provided a small patch that appears to fix the issue:
https://github.com/dazuma/jruby/commit/19f0dcde0570e21e4c7772208a5489093e5502f3
It basically puts an additional guard around the running of the release proc.
In the case where autorelease had been set to false (e.g. the Reaper was not in
the global collection), but the Reaper ran anyway, it checks to make sure the
Reaper actually was present in the global collection before running the release
proc. Because the Reaper already attempts to remove itself from the global
collection when it is executed, that check was already being computed, and so
the overhead added by actually using its result is vanishingly small.
This is a highly intermittent issue so I have no test case to submit. I can
reproduce this a fraction of the time (maybe about 20-30% of the time) by
running the test suite for the rgeo gem (the github head at
https://github.com/dazuma/rgeo) in the presence of the ffi-geos gem
(0.0.1.beta3). I am confident that the above patch fixes the issue because I
have run the test several dozen times with the patch and have not observed any
failures, whereas, without the patch, I can get the test suite to segfault
around 20-30% of the time.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email