RE: Finalizer strangeness

Simon Peyton-Jones Tue, 15 Oct 2002 04:36:25 -0700

|   - in GHC, references from a finalizer are treated as weak
|     references, that is, they don't keep anything alive.  In our
|     paper on weak pointers I seem to recall there was a good reason
|     for this, but I'll need to go back and look it up again.  I'm
|     beginning to think the decision seems a bit odd (after all, in
|     the next GC, the references become non-weak again, because the
|     finalizer is up and running).


There's a good reason for this: otherwise the finaliser would keep alive
the very object whose death it is watching for.

The problem of finalisation order is well known:
        http://www.iecc.com/gclist/GC-lang.html#Finalization

I do not know of any solid solution.  Here for example is one message
from the GC list.  There is a ton of stuff on finalisation in the GC
list archive.

Simon
=======================


I just read through the whole finalization thread several times. Here's
my attempt at a summary, and four questions.

A finalizer is a piece of user-supplied code that the garbage collector
runs when an object becomes garbage --- *with a reference to the garbage
object*.

Finalizers are useful for the following reason:

    The problem is that if the language is missing finalization, sooner
or
    later you will need some other means to deallocate a non-memory
    resource embedded deep in a data structure.  But there is no way to
    determine when to deallocate that resource without redoing all of
the
    work already being done by the garbage collector.  A 100,000 line
    program probably needs finalization in only one or two places; but
    doing without finalization will involve essentially reqriting the
    whole program for manual memory management, with all of the problems
    that entails.

    I find it far easier to read and verify correctness of a program
that
    includes one or two finalizer uses, than to read one that is
manually
    reference counted, because the builtin GC didn't quite have the
    necessary functionality.

        -- Hans-J. Boehm, on gclist around 2001-10-08, in message-id
           <[EMAIL PROTECTED]>

But there are some problems with finalizers.

1. Depending on the collector being used, they can retain those
resources for arbitrary periods of time after they're no longer being
used.  ("Finalization is not prompt.")

2. It's hard to figure out what context to run finalizers in:
- running finalizers in a user thread has the potential to
  indefinitely block that user thread; the halting problem is still
  hard, although there are approaches that involve code carrying proof
  that it will halt.  Running them in a user thread also has a
  deadlock problem if your program uses locks: if the finalizer uses a
  lock that the user code higher up the stack holds, the thread will
  deadlock.
- whether running finalizers in a user thread or in a special thread,
  there is the question of what to do with raised exceptions; allowing
  them to unwind the stack will probably unwind lots of things not at
  all related to the finalizer's context, but just discarding them is
  probably bad, too.

Rick Hudson writes, of finalizer threads:
    This in turn introduces the concept of a special case "finalization
    thread" that is not like other threads, application code can't put a
    priority on it or control it or even kill it yet it can run
    application code.

3. since finalizers can resurrect dead objects, they make the
collector's job harder.  The collector must wait until the finalizer is
finished before it can actually recycle objects the finalizer might
resurrect or run those objects' finalizers.  Also, object resurrection
is, in the abstract, a semantic difficulty; but in practice it doesn't
seem to cause many problems.  (Except in early versions of Java, where
you could use object resurrection to make immortal objects.)

4. It's hard to figure out what to do with objects that are still alive
at program termination.  If you run their finalizers, then those
finalizers have to deal with the fact that some objects they have access
to may already have been finalized, which can never happen during normal
program execution.  Conversely, if you don't run their finalizers, the
resources you were depending on the finalizers to release might not get
released. ("Finalizers are not sure.")

5. Whatever subsystem is responsible for allocating the resource you're
using finalizers to release should know that you're doing this
--- because when someone tries to allocate, say, another file
descriptor, and there aren't any free, it should be smart enough to
invoke a garbage collection or two and try again.  Not doing things this
way means that your garbage collector not being prompt or sure enough
will break your program; for example, adding more memory might cause a
previously working program to run out of file descriptors.  [I didn't
see much discussion of this problem, so perhaps it doesn't come up much
in real life?]

6. [Something or other about ordering? is this the thing in section 3
about when to finalize objects that are referenced from other
finalizable objects?]

Some people like "death notices" instead of finalizers.  A "death
notice" is a message placed by the collector when it frees an object,
which a live object can later read and take action on.
java.lang.ref.PhantomReference is one way of implementing this: it's a
lot like a weak reference (i.e. it doesn't keep the object from being
collected and you get notified when the object has been collected)
except that you can't dereference it, ever, even when the object exists.

Death notices share problems 1 and 4 with finalizers, but avoid problems
2 and 3.  They make problem 5 worse, because the file descriptor
allocator also needs to invoke the application's
file-descriptor-death-notice-queue-reading code --- which also brings
back problem 2 with a vengeance --- or fail.  (Unless that code is
runnable in another thread, in which case the file descriptor allocator
just needs to wait until it's done.)

Death notices can be implemented on top of finalizers.  Finalizers
cannot be implemented on top of death notices, because a finalizer has
access to the dead object, and a death notice does not.

In the halfway ground, there are near-death notices, where you have an
'owner' object with a reference to the victim object, and the GC
notifies the 'owner' object when that reference becomes the only
reference to the victim object.  The 'owner' object can then release
resources associated with the victim object before it releases its own
reference, causing that object to be finally collected.

Near-death notices share problems 1, 3, and 4 with finalizers, and
partly share problem 2.  They make it easier because the mutator can
wait until it doesn't hold any locks to handle near-death notices, even
without being multithreaded.  Also, they remove the semantic
contradictions surrounding normal finalizers, such as "resurrection".

They make problem 5 worse in the same way that death notices do.

Near-death notices can be implemented on top of finalizers, if you
aren't picky about the owner object having a reference to the dying
object until after it's dying, and finalizers can be implemented on top
of near-death notices, if you aren't picky about the finalizer running
promptly.

Does that summarize all the substantive points of the discussion about
finalizers?  Is any of it controvertible?


There was some other discussion that wasn't really about finalizers on
the finalizers thread:

- separating destruction from deallocation in C++ and why it's difficult
- the problem with finalizers being methods of objects in Java (so
  unrelated code has to use death notices to get notification of the
  object dying, and there's at most one finalizer per object)
- the fact that some people try to use finalizers for C++-style
  "resource acquisition is initialization", which doesn't work in
  general because finalizers aren't prompt or sure, unless your 
  collector is smart enough to promptly collect objects of definite
  extent


So, my comments.

With regard to finalizers not being sure being a problem, if you have
resources that don't get freed when your program exits, mightn't that
cause problems in other cases, too, like a machine crash?  

Do finalizers have any advantages over near-death notices, other than
problem 5 not being as bad?  It seems that near-death notices have two
advantages over finalizers: they give us more sensible terminology for
discussing cleanup, and they work in single-threaded applications.

Do near-death notices have any advantages over death notices?  It seems
that death notices would make the collector's job easier.  The only
thing I can think of is that it might simplify your design to have the
resource to be freed referenced only in the object that owns it, rather
than in both that object and its nanny object.  The only suggestion I've
seen on the list is that the mutator might want to bring in faith
healers or steal the wedding ring of the garbage object, and I can't
imagine what these actions would be analogous to in my programs.

Is it more difficult to verify code that figures out when to free
resources by hand than to prove to yourself that the resources do, in
fact, eventually get freed?  If I understand correctly, Hans-J. Boehm
asserts that it is in his quote above, but I don't understand why.

_______________________________________________
Cvs-ghc mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/cvs-ghc

RE: Finalizer strangeness

Reply via email to