I guess I should clarify. Hopefully our disagreement is not too fundamental. Also, I should put up a disclaimer that I'm not that experienced yet, and systems programming is not my specialty, so I did not and do not intend to disagree with you on performance issues.
On Mon, Apr 16, 2012 at 1:42 PM, Jonathan S. Shapiro <[email protected]> wrote: >> At any rate, maybe a typesafe runtime for all programs is not the right >> goal. > > Splat! (That's the sound of spitting something disgusting out of my mouth > disgustedly). > > I have seen many runtimes that cannot be implemented efficiently in > currently deployed commercial type systems. I have not seen any runtime > systems that inherently need to be type-unsafe. I have not seen any runtime > systems that would need to give up performance in order to be type-safe. The > latter statements assume that we apply modern type knowledge to these > systems. You can't do efficient type safety in the type systems of Java, C#, > or ML. The restriction as I see it is not the general idea of type safety, but the idea that there is some "one true" safe type system in which all programming can be done without giving up performance. You are most probably right that pretty much any runtime can be made type safe. But good luck finding a single type system complete enough to admit all those runtimes without modification. The problem is that it's backwards to commit to a type system before the program you want is written. I believe we've gotten away with it so far because successful type systems so far are still pretty simple, or at least based on something simple, and because with simple type systems you expect to be unable to write a program exactly the way you want. When you make a type system, you've essentially chosen your well-typed programs, it's just not necessarily obvious what things those programs can or can't do. For example, you discovered that the BitC type system, as implemented, does not admit the IO library you wanted. If there's any moral to the story, and these post-mortem discussions, maybe it's that as type systems get fancier, there are lots and lots of tricky design decisions, and the wrong decision will subtly rule out a program you want. My goal is a better way to design type systems, so that your program, _exactly as you want it_, is well typed in your type system by construction (of the type system, not the program). A natural consequence of this goal is the idea that different components can, and sometimes should be written using different type systems. If my goal were reached, we would design type systems by abstracting out common arguments from provably-safe programs, not by picking a set of abstractions haphazardly, and then seeing what programs they are sufficient for. This is analogous to how abstract interfaces for software components can be designed either by abstracting details of known implementations, or by intuition and good luck. This reversal is possible for type systems because of semantic approaches to type systems. In other words, a type system is an abstract interface to certain logical safety arguments. Let me explain: The current popular way to prove type system properties is syntactic. You reason by inductions over the syntax of programs, types, and type derivations. These techniques are (relatively) straightforward, and pretty powerful, but they leave the "meaning", the semantics, of types up to intuition. And intuition is often short-sighted, leading to selection of the wrong type system. But there is also the wider family of approaches for studying type systems based on semantic modeling. Basically you interpret features of a language (including types) as mathematical or logical objects. I'm specifically interested in approaches where: - Data values are interpreted as objects in some model of computation - Types are interpreted as predicates on (the interpretations of) data values - The judgement that a value has a type is just the predicate applied to the object - Type derivations are interpreted as proofs of (the interpretations of) type judgements In this view, type derivations are a high-level language for formal proofs. A type checker is in effect a highly specialized automatic theorem prover. I've left out technical details, and useful generalizations to get the point across. This is the basic approach being used by the recent projects at Princeton led by Andrew Appel. Appel is using it to ground all the study of type systems, compilation, and program verification in formal logic, but I like it mainly because it provides a more flexible safe foundation than any type system, upon which useful type systems can be _discovered_, so that they don't have to be invented. You decide what things your type system should be able to reason about, and this dictates your choice of computation model, in a much more direct way than the leap to a finished type system. Equipped with your computation model, and a bag of formal semantics tricks, you let your type system design itself by working through the proofs for short examples, and generalizing to yield expressive types and sound typing rules. My current project is to build a framework and sort-of rapid prototyping tool for advanced type systems, based on a proof assistant, low-level computation model, and semantic modeling. Even if a framework like this doesn't itself end up as the future foundation of software, I bet it will be useful for finding a "one true" type system, if it exists. >From reading some of Appel's papers, my impression is that semantic modeling is very good for coming up with type systems that handle many styles of reasoning at once. But the dependence on a fixed computation model means that a given type system still can't do everything; a program that requires an extension to the computation model would require sweeping changes to the type system model as well. OK, well hopefully I haven't talked your ears completely off. In case you're interested, I recommend two of Appel's papers as further reading: - "A Logical Mix of Approximation and Separation" Describes a flexible program logic, and some concepts behind it. It's awesome 'cause tons of low-level type system features could be defined on top of it. I bet a significant chunk of BitC's type system (if not all of it) could be encoded in this logic. The program logic is itself interpreted into general-purpose logic (Coq, specifically), but that's covered in another paper, which is cited. - "Verified Software Toolchain" Overview paper of a research project by the same name. The project aims to use a programming logic in the style of the above paper, and designed to reason about concurrent C programs, in combination with the CompCert compiler, to verify concurrent C programs assuming only general-purpose logic. It gives an overview of how semantic modeling can reason about concurrency (more details in a cited paper) and be integrated with verified compilers. The theme of the semantic modeling work of Appel et al is that it's definitely powerful and flexible, but a lot of work. So my project goal could be stated as "make proof assistants better so that it's not so much work". >> Further, even if it is, it's restrictive; not all applications need GC. >> For example, those programs which can be shown to be safe just using >> region typing should not be using GC. > > > Region safety does not guarantee timing of release or place any bound on > heap growth. A region-safe and region-managed program may not release heap > rapidly enough to run to completion. Regions definitely help, but they are > not a replacement for GC. OK, I guess I got mixed up on the details of region types, since I haven't actually learned them. I thought there was region typing based on stack allocation that didn't need GC, but maybe I'm thinking of something else. My general point, that not all programs need GC to be safe, is clearly true though. I'm arguing against one-size-fits-all solutions, so I didn't mean to imply anything could completely replace GC. >> But of course, we want to be able to build a system out of components >> that use GC and components that only use region-typesafe allocation. > > > Why? I would say rather: we may be forced by the limits of a particular GC > to do this, but what we want is adopt the simplest regime that will actually > solve our problem. Yeah, but if everyone chooses their own simplest regime, how do you put it all together? You can't force everyone to use the same runtime. Maybe you can get them to use the same runtime _within any given process_, and then use IPC to put it together, but ideally it should be practical to combine them in-process, simply because it's possible. Sooner or later, you need to reason about the safe combination of multiple "simplest regimes", whether it's OS correctness, or correctness of a more flexible in-process runtime. The technical mechanisms are different, but I bet the structure of the proof is essentially the same. >> More generally, we want to be able to build a system out of components >> that use all sorts of memory management policies. >> The problem then is that anyone could come up with a new and >> reasonable memory management policy at any time, and we'd want to be >> able to integrate it. > > > That's a lovely image, but I think it's a bad goal. > > The hard experience is that this just isn't possible in practice. We have > extended experience with mingling safe/unsafe environments. As it happens, > we also have experience with fully safe environments where different > processes use different GCs but can share objects in some form. Both > introduce nearly insurmountable complexity and risk into the overall system. Interesting. I don't see why it's so hard. Probably I just don't know better. Can you explain? If it's complexity that the infrastructure builders deal with, I'd say it's worth it. But if it's complexity for application programmers, that's really a deal breaker. >> I originally got interested in BitC because I thought its design goal >> was to be a practical substrate for formal verification. >> But now it seems like Shap is pursuing the "live dangerously" option, >> but trying to make it less dangerous using strong, low-level types. > > > Shap found out that formal verification often yields less confidence than > strong typing, which makes the effort of formal methods not worth the gain. Heheh, yeah. Badly-designed verification projects are pretty useless. But the kind of verification Appel is working on seems very well-designed to me. If an Appel-verified program fails, the only explanations are (in rough order of decreasing likelihood): - The computation model was wrong - The hardware defected - Some widely-accepted logical principles are inconsistent - The original unverified proof checker was incorrect in just the wrong way that miraculously no-one noticed other than that it accepted an invalid proof that nonetheless seemed right but led to the specific defect you found. The computation model being wrong is arguably less likely than I said, since in addition to making intuitive sense, a widely-adopted model will have many proofs about it which reinforce its connection to our intuitions. At any rate, it's easier to trust a simple model of computation than even the simplest language implementation, in my opinion. The last explanation is a trust bootstrapping issue. It becomes exponentially less likely the more proof checkers are successfully verified against each other. I've explained how semantic modeling lets you justify a type system, which in turn justifies programs. If all of that is done atop a cross-verified mechanical proof checker without anything smelling fishy, it seems awfully trustworthy to me. What do you think? >> In summary, I think zero-pause GC is promising, and a safe runtime >> that uses it could be extremely popular, but it will not replace C. > > > Actually, I disagree. If a pauseless GC supporting a high enough sustained > allocation rate becomes widely used, and a low-level type system that is > friendly to high-performance I/O comes into place, I suspect you would find > regulatory agencies all over the world requiring a cutover as a condition of > certification for their member companies. Good point, but such agencies would probably also be OK with the style of verified infrastructure I'm arguing for, assuming it proves to be practical. At any rate, if you're right, then it looks like eventually software will be safe after all. The question is how much of what's possible can we do safely in practice. I'm hoping that with the right verification technology, it will be practical to prove anything safe as long as it is safe, and we intuitively get why. _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
