Re: [Haskell] Eliminating Array Bound Checking through Non-dependent types

Conor T McBride Mon, 09 Aug 2004 03:22:03 -0700

Hello again

Me:

What if I don't trust your kernel?


[..]

What's the downloadable object that can be machine-checked to satisfy
my paranoid insurance company?

Oleg:

I hope that you can forgive me if I reply by quoting the above two paragraphs back to you, with the substitution s/kernel/compiler/.

        What if I don't trust your compiler?

I have heard a similar question asked of J. Strother Moore and
J. Harrison. J. Strother Moore said that most of ACL2 is built by
bootstrapping, from lemmas and strategies that ACL2 itself has
proven. However, the core of ACL2 just has to be trusted.


It's not an issue of forgiveness. I actively encourage such questions,
of myself and other people. I'd also recommend this paper by Randy
Pollack:

  `How to believe a machine-checked proof'
  http://www.dcs.ed.ac.uk/home/rap/export/believing.ps.gz

But the one-line summary is `by rechecking it independently'.
Of course, we'll never achieve _certainty_ in any absolute
sense, but we can do a lot better than `I can't show you the
evidence for security reasons, but there's no doubt that...'.
In order to do that, you need more than a machine which
says `yes'. You need a language of evidence which is simple
enough that people can write their own checker in whatever way
they choose. Martin-Loef's type theory provides a source of
good candidates for such a language of evidence. Its core
proof-checking/type-checking algorithm is very small. This is
what comes out the other end of the Epigram elaborator.

[..]

Hongwei Xi's code contains the evidence I'm asking for.
The verification conditions are added by hand in the program as
annotations, just as yours are annotations outside the program.  His
are checked by Presburger arithmetic, just as yours could be.


Actually, as far as the PLDI'98 paper is concerned, they specifically say
they do not use the full Presburger arithmetics. Rather, they solve
the constraints by Fourier variable elimination. Anyway, even if
the various elaboration and decision rules are proven to be sound and
complete, what is the evidence that their implementation in the
extended SML compiler is also sound and complete?


Another very good question. Every time you appeal to a proof-search
oracle or to an axiomatization of a library, or anything which hides
the evidence, the amount of stuff you `just have to trust' gets
bigger, and the harder it is to perform independent rechecking.

But what if some bright spark (volunteers welcome) were to implement
Presburger Arithmetic in one of the following ways...?
  (1) External to the system, but outputting checkable proof terms,
        instead of just `yes' or `no'.
  (2) Within the system, with a checkable proof that its yea be
        yea and its nay be nay. [Traditional implementation and
        correctness proof.]
  (3) Within the system, with a type which actually guarantees
        what the algorithm proves in the first place. [The
        `two-level' approach.]

The conclusion specifically states that the algorithm is currently
incomplete.


Clearly, soundness is more important. But it is nice to have a good idea
which problems the machine will get and which it won't.

Me:

...this proposition is false. The bounds function returns bounds which
are outside the range of the array when the array is empty.
You'll notice that Hongwei Xi's program correctly handles this case.

Don't get me wrong: I think your branded arrays and indices are a very
good idea. You could clearly fix this problem by Maybe-ing up bbounds
or (better?) by refusing to brand empty arrays in the first place.

Oleg:

I have noticed that the branding trick would work very well with
number-parameterized types. The latter provide missing guarantees, for
example, statically outlaw empty arrays. Hongwei Xi's code has another
neat example: a dot product of two arrays where one array is
statically known to be no longer that the other. Number-Parameterized
types can statically express that inequality constraint too.


Yes, of course. And any dependently typed solution to these problems
would naturally parametrize the data-structures by their size,
represented as ordinary numbers. Statically expressing the constraints
is easy.

The abstract `brand' is just a type-level proxy for the bounding
interval, and the library of operations provides interval-respecting
operations on indices. This is a very neat solution in Haskell, but it
goes round an extra corner which isn't necessary with dependent types,
where you can just talk about the interval directly. The library-writer
would develop and verify the same convenient operations for working
with intervals and indices; the proofs would be independently
recheckable terms in type theory.

The
Number-Parameterized types paper considers a more difficult example --
and indeed the typechecker forced me to give it a term that is
verifiably a proof of the property (inequality on the sizes) stated in
term's inferred type. The typecheker truly demanded a proof; shouting
didn't help.


Rightly so. What tools would be useful in such a situation?

Incidentally, the paper is being considered for JFP, I guess. I don't
know if the text could be made available. I still can post the link to
the code:
        http://pobox.com/~oleg/ftp/Haskell/number-param-vector-code.tar.gz

I should emphasize that all proper examples use genuine Haskell arrays
rather than nested tuples.


[..]

There's no reason why a dependently typed language should not have
genuine arrays.

Yet the type of the array includes its
size, conveniently expressed in decimal notation. One can specify
arithmetic equality and inequality constraints on the sizes of the
array, in the type of the corresponding functions. The constraints can
be inferred. One example specifically deals with the case when the
sizes of those arrays are not known until the run-time -- moreover,
change in the course of a computation that involves general
recursion. It seems that branding trick nicely complements
number-parameterized arrays and makes `dynamic' cases easier to
handle.


Again, all of this sounds like a good thing to do, but less like hard
work if you have a language designed to support it.

I don't know why you emphasize general recursion. It's no big deal in programs which we execute only at run-time. Sure, I prefer structural recursion to general recursion, because I like to have termination for free, and also to be able to use programs safely in types. But I am not a bottom-pincher: you can have general recursion at run-time and still decide typechecking. Here's a cheap and nasty trick that does it (and we're kicking around some less nasty tricks at the moment). Add a hypothesis

  general :: forall p . (p -> p) -> p

This is all you need to write general recursive programs. In Epigram,
your program would invoke `general' once, and then make whatever
recursive calls you like. None of these programs can loop during
typechecking, because `general' has no computational behaviour. We
can tell which programs/proofs to trust by which use `general'. Only
when we output run-time code do we make

  general f = f (general f)

Of course, if you really want to run this program in types---at
your own risk---we plan to let you. There are three security levels:

  paradise:    a complete absence of generals
  purgatory:   there are generals, but their orders are not obeyed
  pandemonium: what happens when you listen to generals

A compiler-switch would do, but annotations on individual functions
would be better.

So please, everybody, no more `What about general recursion? What about decidable typechecking?'. It's no big deal. Why do some people presume that Cayenne's particular problems are intrinsic to any dependently typed language?

You'll notice that Hongwei Xi's program correctly handles this case.


But what if I specified the dependent type with a mistake that
overlook the empty array case? Would the dependent ML compiler catch
me red-handed? In all possible cases? Where is the formal proof of
that?


I can't speak for dependent ML, but you wouldn't get away with it in
Epigram. Epigram rules out _provably_ impossible cases and provides
a language in which to provide this guarantee. In the mundane
majority of cases, this proof amounts to `constructors of datatype
indices are disjoint', which gets handled automatically.

        I have failed to emphasize the parallels between Hongwei Xi's
annotations and the corresponding Haskell code.


It wasn't lost on me...

[..]

Yes, there is a trusted kernel involved -- just
as there is a Dependent SML system to be implicitly trusted.


I agree: the appeal to the constraint oracle is a weakness. As is
the appeal to informal proofs, or axiomatizations of the library.

However,
in the given example the trusted kernel is compact Haskell code plus
the GHC system. The latter is complex -- but it is being used by
thousands of people over extended period of time -- and so has higher
confidence than experimental extensions (unless the latter have been
formally proven -- I mean the code itself -- by a trusted system such as
ACL2 or Coq).


Confidence of what? Remember that Haskell is logically unsound. I can
fake a branded index by taking the head of an empty list. Fortunately,
this fake will always be discovered at run-time before unsafeAt goes
haywire (that ain't true for the empty array bug). However, it somewhat
weakens the static guarantee!

        I do not wish to sound as being against Dependant Type
systems and implementations. I merely wish to point out poor-man
approaches.


Don't get me wrong either: I agree with you. I'm in favour of cranking
as much static confidence out of Haskell's type system as possible. And
some pretty good stuff is possible (modulo bottom). However, it's very
easy to pick up such programs, implemented by hook or by crook (I'm a
crook) in Haskell and say: `See? Who needs dependent types?'. The truth
of the matter is that these programs are rather like the dependently
typed programs you might write, except that you need to use type-level
proxies for values and type-level Prolog trickery, instead of ordinary
values and ordinary programs. The dependently typed versions of these
programs are simpler than the Haskell versions. In the short term,
Haskell is the here and now, so do what you can; in the longer term,
choosing type class unicycling over dependent types is a false economy.

[..]

        - Try to make `safety' checks algorithmical: use newtypes to
`lift' the safety checks such as range checks and see if they can be
merged with the tests the algorithm has to do anyway.


Yes, this is what often happens with dependently typed programs. The
key thing to ensure is that the tests have types which show their
meaning statically. A Bool is a bit uninformative.

        - Use explicitly-quantified type variables to associate
`unforgeable' types with values, so that the following property holds:
the equality of such types entails the equality of the corresponding
values -- even if we don't know what they values are until the
run-time.


Exactly: type-level proxies for values. It's the right thing to do
if you can't have values in types. I'm in favour of the `datakind'
idea: Haskell's type-level proxies for values should look as much
like values as possible!

        - Convenient (decimal) type arithmetics at compile time -- and
even `semi-compile' time.


There's nothing which forces numbers in dependent type systems to be
unary. That's just the path of least resistance in prototypes. Also,
it reflects the importance of the inductive structure of numbers
when they are being used as indices. I wonder how much this really buys
you when you're developing indexed operations, as opposed to when you
want to type actual values. I guess I'll read the paper.

I'm interested in just how far those principle may take me.


That's a good thing to do. I'm trying to deliver a language which makes
exactly that kind of program much easier to write. I hope that sounds
like a good thing too.

Thank you very much for your message!


And thank you!

All the best

Conor

_______________________________________________
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] Eliminating Array Bound Checking through Non-dependent types

Reply via email to