Makes sense to me. The weak map has been related to at least a couple
of IO-related issues:
http://jira.codehaus.org/browse/JRUBY-1079 (IO.sysopen not defined)
http://jira.codehaus.org/browse/JRUBY-1527 (Tempfile fails to clean
up)
In each case, the fact that JRuby automatically closes the file
descriptor when the relevant IO-handling objects are garbage-collected
has introduced complications.
The Ruby IO class's open/block syntax provides an explicit contract
whereby file descriptors are closed when the objects that manage them
fall out of scope. Since there is an explicit way to do this, I'm not
sure it's a good idea to also do it implicitly -- e.g. when you use
File::new and the instance gets garbage-collected. Even though MRI
does so. (See code below.)
Perhaps compatibility with MRI on this point is why a weak map was
chosen? So the question may be: Is it better to emulate MRI in this
case, or to avoid surprising behavior?
--Riley
#---snip---#
def my_sysopen(path)
file = ::File.new(path)
return file.fileno
end
do_gc = true
do_mysopen = true
my_path = "/dev/null"
my_fileno = do_mysopen ? my_sysopen(my_path) : ::IO.sysopen(my_path)
::GC.start if do_gc
my_file = ::File.open(my_fileno)
# if do_mysopen && do_gc, raises Errno::EBADF here
my_file.readline # else raises EOFError here
#---snip---#
On Feb 23, 2009, at 11:15 PM, Charles Oliver Nutter wrote:
It just occurred to me that our map from numeric file descriptors to
open channels doesn't need to be weak, and indeed making it weak
could actualy be *incorrect*.
In libc, if you open a file, you get a file descriptor as an
integer. That descriptor is guaranteed to be kept open for you until
the process terminates or you close the fd yourself. So our allowing
the channel associated with a numeric fd to possibly GC and finalize
breaks that model.
Also, the case we're trying to prevent with a weak descriptor map--
that of an application spinning up lots of IO objects and never
closing them--would be a leak under libc as well; you'd quickly
reach an open fd limit before garbage collection kicks in.
It seems like what we really want here is for the ChannelDescriptor
associated with the fd to only unregister itself on GC or close, and
have the map be hard references. This allows e.g. sysopen to work
correctly all the time (rather than having a separate hard-
referencing map as we do now) and probably wouldn't lead to any more
descriptor leakage than we have today, since the expectation is that
all channels are being properly closed to begin with. And the
finalization of ChannelDescriptor would help ensure the map gets
cleaned up if an application really is leaking descriptors.
Does this make sense?
- Charlie
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email