Re: Verifying Toolchain Semantics

William ML Leslie Mon, 06 Oct 2014 01:52:10 -0700

On 6 October 2014 11:30, Ian Grant <ian.a.n.gr...@googlemail.com> wrote:

> http://lists.gnu.org/archive/html/guile-devel/2014-10/msg00016.html
>
> From:    William ML Leslie
> Date:    Mon, 6 Oct 2014 00:57:49 +1100
> On 3 October 2014 22:56, Taylan Ulrich Bayirli/Kammer <address@hidden>
> wrote:
>
> > Say, for example, that I can guess you will end up viewing the
> > document with one of five output programs - maybe, evince, chrome,
> > tk, gecko translating xsl-fo to xul or firefox viewing html 4
> > directly, and say that I only have exploits for the first three.
> > Now I have a bit of a problem: I need to find a way to get your tool
> > to emit the specific structure that will invoke the exploit.  Since
> > it will show a document that looks correct, although underneath may
> > have a different structure, I will have to account for variation in
> > what it emits but there will be a limited class of algebraic
> > manipulations that it can make without changing the semantics of the
> > document.
>
> It's not about this. These are all the programs that have problems.
> We're not going to use these sorts of programs anymore. We're going to
> use auto-generated, nearly perfect code. And after a few years it
> won't be just near perfect, it'll be perfect.
>

Verifiable, safe renderers for these kind of formats is something I really
want to see, but if you have that, why do you need the code to have this
amorphous `semantic-fixpoint` quality?  What does it give you?

>
> > Maybe I would not find a way to /reliably/ produce a document that
> > will infect your machine.  Or maybe I would.  It's hypothetical.
> > But you know as well as I do that fuzzing a document generator to
> > see what instructions produce exploitable output is easier than
> > either writing a verifiable PDF viewer or writing a tool that
> > bisimulates a PDF document with a class of layout languages.  The
> > cherry on top is that all that extra code you introduce is surely
> > ripe with additional vulnerabilities - so maybe I don't need to
> > bother with producing those back-end exploits at all.
>
> This is wrong. If you look at the automatically-generated assembler
> code from a formal spec, in the form of a functional program
> generating that code:
>
>
> https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/AbstractMachineWord.sml
>

The paragraph I wrote above talks about three things, in increasing order
of labour:

0. Fuzzing #2
1. A verifiable PDF viewer (a formal model, together with some
implementation of that model, generated or not)
2. A tool that bisimulates a veriety of layout languages

I was trying to suggest #1 as the ideal solution, as you seem to be here.
The original straw-man was #2, intended to be dual to the semantic fixpoint
you described.

>
> you will see that the extra code has no extra attack vectors. It's all
> machine-generated. If this were a piece of code for reading a binary
> file, then it would have been generated from a grammar describing
> exactly what are the valid words in that language. (Using words and
> language in the technical sense as "valid strings" and "set of all
> valid strings".) So buffer overruns won't happen. And if all the
> semantics are formalised, then off-by-one errors won't happen
> either. If they do, they'll happen all over the place and you will
> find out very quickly.
>
> > Though I think your analogy is wrong.  PDF is already an abstract
> > semantics, like any formal language spec.  The question is whether
> > an implementation realizes precisely those semantics, or adds "oh
> > and insert a backdoor" to some pieces of code in that language, "as
> > an addition to the language semantics."  This augmentation to the
> > semantics needn't be apparent in the source code of the
> > implementation, since the language it's written in might again be
> > compiled by a compiler that "augments" the semantics of the language
> > it compiles, and that in turn needn't be apparent in the compiler's
> > source ... until you arrive at the since long vanished initial
> > source code that planted the virus.
>
> > I hope by now you understand why I think this to be nonsense.  It's
> > still a C compiler.  You know that, I know that.  Why wouldn't the
> > machine be able to tell?
>
> Because it's formally undecidable. That's Rice's Theorem. And it's not
> "in practice" probably detectable by semantantic methods, because we
> are going to extra lengths to obscure the effective operational
> semantics by multiple re-encodings. How are you going to recognise a
> compiled Fortran90 program translating a C program into an abstract
> assembler like lightning, as a C compiler? You don't even know what
> the opcode numbers are: we can permute them.
>

Worse comes to worse, you feed it a small C program as input inside an
abstract interpreter.  It's the victim's CPU that is being burned, so
nontermination of the algorithm is a minor problem for the attacker, who
needs to ensure utilisation is low enough to avoid questions.

Rice's theorem applies to the entire class of programs that people can
write in a turing-complete language, but in practice, people write programs
that can be checked for denotational equivalence.  It is costly
(computationally) to do this with a general purpose programming language,
but I think you'd be surprised at the wide range of existing programs that
we can analyse that way. *

I don't think that a semantic fixed point would be problematic for
detection of C compilers.  While the elaboration you write to go from
formal semantics to concrete implementation will obviously be
non-deterministic, it will still be algorithmic, in that someone ultimately
needs to list ways that definitions can be turned into concrete
implementations.  These make up an initial algebra, the basic terms of
which can be treated individually, and in particular, the 'lossy' aspects
of the translation that should make it difficult to reverse the process
(such as introduction of spurious effects) can be categorised.  The output
program can be partially normalised, and the resulting effect graph
analysed with the effect types of the implementation terms.

If nothing else, reducing to the first few elements of the IO colist and
normalising the values involved is unlikely to be prohibitive (otherwise
the program wouldn't run at an acceptible speed anyway).

Hah - in a real bind, you don't need to detect the C compiler at all -
always add code that wraps IO calls / external effects, and augments the
output if need be.

* Can't seem to find a good paper on this at the moment, but I did enjoy
this post that has just been doing the rounds:

http://www.haskellforall.com/2014/09/morte-intermediate-language-for-super.html

  Papers saying "I tried to do it and failed" are more common, but I think
it's just a common task for the directionless PhD student.

> > The problem that I have with this discussion is that rather than
> > address many* of the points in my email, he instead resorted to
> > insisting I was a shill.  It's very difficult to have a conversation
> > with someone who uses silencing tactics.  Or better - it's
> > pointless.
>
> What are you talking about? Here is what you wrote to me:
>
>    http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00036.html
>
> > Please understand that our lack of enthusiasm for it is not without
> > our own individual commitments to the security and freedoms of
> > users; rather while the extra assurances your project may provide
> > sound nice, they are frankly rather minimal and don't have much
> > additional utility, and so are not priority one.
>
> Now if you wanted me to continue answering your questions, then well,
> I'm sorry, I just got the wrong end of stick completely. I thought you
> wanted me to fuck off!
>

I see.  I can be a bit literal sometimes.  I was being honest about my
priorities in response to a request to down-tools and work on this.

>
> So I replied as below. To you, privately. And that was it. Silencing
> tactics? You then copied it to the world, and a guy who I _told_ you,
> was deputy director of the NSA's National Computer Security Centre!
>
>
I was talking about this:

http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00038.html

Maybe that's not what you were trying to do.

-- 
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered under
copyright law.  You absolutely MAY reproduce any part of it in accordance
with the copyright law of the nation you are reading this in.  Any attempt
to DENY YOU THOSE RIGHTS would be illegal without prior contractual
agreement.

Re: Verifying Toolchain Semantics

Reply via email to