[bitc-dev] DISCUSS: Initialization order

Jonathan S. Shapiro Tue, 05 Aug 2008 11:36:50 -0700

This is a major issue, and I *promise* that I am going to get it wrong
at least twice. :-)  At the moment, I think I have the requirements
correctly stated for causal initializer ordering, but further
restrictions may be needed in order to statically enforce those
requirements.


As usual, credit to Swaroop and blame to me.


The most important outstanding issue in BitC is pinning down what is
legal in a top-level initializing expression. This note will attempt to
nail this down, but before I try to define this, let me state some
requirements.

REQUIREMENTS:

1. The state of the program at the first instruction of main() should be
well defined, and should not depend (as in O'Caml) on the accident of
link order. In particular, we should avoid the C++ bug that constructors
on globals run in unspecified order [a fact that recently bit me].

2. In abstract, it should be possible for all initializing expressions
to be evaluated at compile time by a whole-program compiler. No compiler
should be obligated to implement this, but it should be possible for an
aggressive compiler to *choose* to implement this.

3. The result of initialization should respect causal ordering, in the
sense that any identifier referenced by an initializing expression must
be initialized before it is used.

4. Requirement [3] should be preserved in the face of separate
compilation. This is a restatement of [1], but it motivates some
restrictions on use-occurrences in initializing expressions.

5. No working program should break in the field as a consequence of a
dynamic library update, unless due to bugs in the library. In
particular, the satisfaction of initialization ordering constraints and
type checking should not lead to breakage in the field.

DISCUSSION:

Causal ordering in the presence of separate compilation is moderately
difficult to establish. We can (just barely) achieve it if we require
that initializing expressions are pure, meaning that the result of any
initializing expression is some deterministic function of the inputs to
that expression plus the pre-existing bindings in the lexical
environment of that function. If we allow initializing expressions to
have side effects, the ordering requirements must be strengthened beyond
what can be achieved under separate compilation.

   Details: certain side effects are actually okay. The particular
   side effects that cannot be permitted are side effects to objects
   in the heap (including the global bindings) that are reachable from
   the global bindings. Mutations of purely local, uncaptured state
   are actually okay.

   Perhaps a better way to say that is to emphasize my choice of wording
   of the requirement. The requirement is that the initializing
   expressions must be pure in the mathematical sense. This is not the
   same as requiring them to be functional in the sense of ML or
   Haskell.

We need to allow initializing expressions to refer to global mutable
variables (that is: to source them, not to target them). The constraint
we will impose (by construction) is that globally reachable state cannot
be modified until after all initializers have been executed.


RULES ENSURING CAUSAL ORDERING:

The following rules give us something very close to causal ordering:

  1. If an interface B imports an interface A, all symbols bound in
     interface A will be initialized before any symbol in interface B is
     initialized.

  2. At any use occurrence where a symbol X is evaluated by an
     initializing expression, that symbol must be defined (not just
     proclaimed) in the lexical environment of the use occurrence.

  3. At any use occurrence where an IMPORTED symbol X bound in an
     interface I is defined, we examine the transitive closure of
     free variables that are evaluated by the initializing expression
     of X, and we require that for each such free variable V, either

       A) V is defined (not just proclaimed) in the lexical environment
          where X is bound [That is: in the *interface* environment], OR

       B) V has been imported into the lexical environment where X is
          bound [That is: the *interface* environment], OR

       C) V is defined in the source unit of compilation where X is
          defined, and further is defined in the lexical environment at
         the point where X is defined [That is: the environment at the
         point of definition].

     This restriction is a bit subtle. What it ensures is that anything
     appearing in the initializer for X respects (transitively) the
     interface initialization ordering requirement.

     When rule (3B) applies, the interface supplying B must still be
     imported into the defining unit of compilation in order for symbol
     resolution requirements to be satisfied. The purpose of (3B) is to
     ensure that causal ordering of exported symbols cannot be violated.

  4. Every initializer expression must be a pure expression.


[INFORMAL] RULES FOR PURE EXPRESSIONS:

  1. Every literal is a pure expression

  2. If e is a pure expression, then e:T (that is, e with a stated
     type) is a pure expression provided that e:T type checks.

  3. If C is a constructor, then (C e1 ... en) is a pure expression
     exactly if e1 ... en are pure expressions.

  4. (lambda (...) ...) is a pure expression.

  5. (set! <pureloc> e) is a pure expression exactly if:

       <pureloc> is a non-escaping let-bound identifier that is bound
                 locally (including parameters, but excluding locally
                 bound identifiers captured by a closure)
       <pureloc> is x.y, and x is a pure location
       <pureloc> is x[e], x is a pure location, and e is a pure
                 expression

  6. (F e1 ... en) is a pure expression exactly if:

       F is a pure procedure, AND
       e1 ... en are pure expressions

The key part of this is the definition of a pure procedure, which is
going to take some describing.


[INFORMAL] RULES FOR PURE PROCEDURES

Because of higher-order functions in BitC, the fact that a procedure is
pure does not guarantee that the application of that procedure is pure,
and is not intended to do so. As an example, consider the usual
definition of MAP:

  (define (map f l)
    (case tmp l
      (nil  nil)
      (cons (cons (f tmp.car) (map f tmp.cdr)))))

The problem here is that MAP applies the argument procedure F, so the
purity of any application of MAP depends on the properties of its
argument procedure. The usual typing of MAP:

  (fn ((fn 'a 'b) (list 'a)) (list 'b))

doesn't tell us enough to know how F behaves.

A procedure F is pure if every expression evaluated in the execution of
F is a pure expression. In MAP, this depends on F, and we need to
express that dependency as a type restriction. We are introducing
"pure-fn" for this purpose, and the correct typing of MAP is therefore:

  (pure-fn ((pure-fn 'a 'b) (list 'a)) (list 'b))  OR
  (fn ((fn 'a 'b) (list 'a)) (list 'b))

With the intended meaning that MAP is a function as before, but with the
further observation that the application of MAP is pure provided that
the function passed to MAP is pure.


What is stated informally above works (we think), but it is probably not
a good way to capture the issue mathematically. I will attempt to do
that in my next note.


shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

[bitc-dev] DISCUSS: Initialization order

Reply via email to