In light of current discussion of the tradeoff between useful code and security, just wanted to repost the summary below of basic theoretical results about what can be done, compiled with help from this list.
---------- Forwarded message ---------- From: "Paul Burchard" <paulburch...@gmail.com> Date: May 28, 2014 1:59 AM Subject: Re: [langsec-discuss] Basic results? To: "Evan Sultanik" <evan.sulta...@digitaloperatives.com> Cc: <langsec-discuss@mail.langsec.org> Evan, Thanks again for all the great references. Here is my initial take on the underlying theoretical foundations. In terms of pure language restrictions, the basic one underlying these approaches is the restriction to bounded recursion (primitive recursive functions). This seems like a useful compromise, which allows full security analysis in time bounded by the recursion bounds at the cost of reduced computational power. This is the same type of restriction mentioned by Peter from general finite state machines to acyclic ones, which appears to still have substantial applications. A different approach is to restrict the security questions that may be asked in exchange for greater (but still not Turing complete) computational power. These approaches are characterized by clear security questions but rather implicit models of computation. The static analysis tools you mention (for example to prevent recursion-based DoS) are pessimistic predictors of certain types of bad behavior; they can be thought of as defining a reduced computational model that eliminates some perfectly good code (to continue the example, eliminating any code where tainted inputs can affect recursion bounds, a sort of relativized bounded recursion). Another approach of this type pointed out to me by Matt DeMoss called "secure multi-execution" guarantees that a partially ordered set of security levels are respected, at a cost of linear time in the number of security levels (by executing the code once per security level); meanwhile, only misbehaving code has its behavior altered while conforming code can be fully Turing complete. Finally, there is the approach of not restricting the computational power but using heuristics for answering security questions, which may be successful in many practical cases but could potentially be at least exponentially expensive. You can think of this as making the set of security questions that can be feasibly answered rather implicit. The constraint solvers you mention fall into this type, for example transforming the security question into a SAT instance (which could be exponentially expensive). Let me know if I have misunderstood anything. On Mon, May 26, 2014 at 08:55:59PM -0700, Paul Burchard wrote: > What are known results mitigating the basic computer science problem that > it's normally expensive or impossible to answer the fundamental language > security question: what might some code do with the powers it's given? It depends on what question you're ultimately trying to answer. While answering the question, "What might this code do *under all possible inputs* with the powers it is given?" is usually intractable, if not undecidable, it is important to remember that solving the "dual" of the problem is often much easier: "Does there exist an input that causes this code to [perform a specific behavior]?" In other words, it is often much easier to find *at least one* certificate (i.e., input) that exercises a certain desired behavior (e.g., halting). If there are only a limited number of behaviors about which one is concerned, this approach can often be tractable, if not even efficient. I may be side-stepping your actual question here, but my point is that it can often be fruitful to constrain the set of inputs as opposed to constraining the language. There are already many program analysis techniques [1,2] like concolic execution that provide an "oracle" for answering questions like, "Does there exist a set of inputs that causes [a specific pointer] to be used after it is freed?" Constraint solvers [3,4] allow us to query for inputs that exercise almost arbitrarily complex behaviors, many of which are tractable to solve. There has also been some recent success at using machine learning to classify the behavioral characteristics (as opposed to logical flow) of black-box binaries [5]. Liveness and deadlock can also be relatively efficiently determined using solely static analysis and model-checking [6,7]. <shameless-self-promotion>My employer, Digital Operatives [8], has developed a static analysis tool for detecting liveness and performing arbitrary constraint satisfaction and optimization on LLVM/IR.</shameless-self-promotion> Now, back to your actual question. It's clear that non-Turing-complete languages can be useful. For example, Hofstadter's old BlooP [9] language can do useful computations like computing factorials, however, it explicitly disallows unbounded (infinite) loops, so the halting problem is decidable for it. A somewhat more recent---but still old---example is the CHARITY programming language [10], which uses category theory [11] to guarantee termination. Finally, you probably want to read the Hume Report [12] which describes a hierarchical system architecture that provides layers of non-Turing-completeness for security. - Evan [1] http://llvm.org/pubs/2008-12-OSDI-KLEE.html [2] http://s2e.epfl.ch/ [3] http://chicory.stanford.edu/PAPERS/STP-ganesh-07.pdf [4] http://z3.codeplex.com/ [5] http://www.blackhat.com/us-13/briefings.html#Saxe [6] http://www.cs.utexas.edu/~shmat/shmat_csf09.pdf [7] http://research.microsoft.com/en-us/um/people/lamport/tla/bo ok-02-08-08.pdf [8] http://www.digitaloperatives.com/ [9] http://en.wikipedia.org/wiki/BlooP_and_FlooP [10] http://pll.cpsc.ucalgary.ca/charity1/www/home.html [11] http://en.wikipedia.org/wiki/Category_theory [12] http://www-fp.cs.st-andrews.ac.uk/hume/report/hume-report.pdf _______________________ Evan A. Sultanik, Ph.D. Chief Scientist Digital Operatives, LLC Mobile: +1 215 919 7234
_______________________________________________ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss