Makes sense to me. The weak map has been related to at least a couple of IO-related issues:

 http://jira.codehaus.org/browse/JRUBY-1079 (IO.sysopen not defined)
http://jira.codehaus.org/browse/JRUBY-1527 (Tempfile fails to clean up)

In each case, the fact that JRuby automatically closes the file descriptor when the relevant IO-handling objects are garbage-collected has introduced complications.

The Ruby IO class's open/block syntax provides an explicit contract whereby file descriptors are closed when the objects that manage them fall out of scope. Since there is an explicit way to do this, I'm not sure it's a good idea to also do it implicitly -- e.g. when you use File::new and the instance gets garbage-collected. Even though MRI does so. (See code below.)

Perhaps compatibility with MRI on this point is why a weak map was chosen? So the question may be: Is it better to emulate MRI in this case, or to avoid surprising behavior?

--Riley

#---snip---#

def my_sysopen(path)
  file = ::File.new(path)
  return file.fileno
end

do_gc = true
do_mysopen = true

my_path = "/dev/null"
my_fileno = do_mysopen ? my_sysopen(my_path) : ::IO.sysopen(my_path)
::GC.start if do_gc

my_file = ::File.open(my_fileno)
# if do_mysopen && do_gc, raises Errno::EBADF here

my_file.readline # else raises EOFError here

#---snip---#


On Feb 23, 2009, at 11:15 PM, Charles Oliver Nutter wrote:

It just occurred to me that our map from numeric file descriptors to open channels doesn't need to be weak, and indeed making it weak could actualy be *incorrect*.

In libc, if you open a file, you get a file descriptor as an integer. That descriptor is guaranteed to be kept open for you until the process terminates or you close the fd yourself. So our allowing the channel associated with a numeric fd to possibly GC and finalize breaks that model.

Also, the case we're trying to prevent with a weak descriptor map-- that of an application spinning up lots of IO objects and never closing them--would be a leak under libc as well; you'd quickly reach an open fd limit before garbage collection kicks in.

It seems like what we really want here is for the ChannelDescriptor associated with the fd to only unregister itself on GC or close, and have the map be hard references. This allows e.g. sysopen to work correctly all the time (rather than having a separate hard- referencing map as we do now) and probably wouldn't lead to any more descriptor leakage than we have today, since the expectation is that all channels are being properly closed to begin with. And the finalization of ChannelDescriptor would help ensure the map gets cleaned up if an application really is leaking descriptors.

Does this make sense?

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

  http://xircles.codehaus.org/manage_email



Reply via email to