(apologies if this is received repeatedly; I'm having some skill- related issues with the mailing list...)
Hello, all. Nice to see folks interested in this topic! I responded on bsky already: https://bsky.app/profile/pitrou.net/post/3mejc4q375c2d I wanted to also clarify here. The most critical question from what I see here is a question about input space projection. Think of it this way: all programs which consume unstructured data consume unstructured data. Programs consume this in different ways; you might consider this on different levels of the Chomsky hierarchy, e.g.: 1. Some parts of your program make sure that the data is composed of valid byte sequences (aka, lexically valid) 2. Some parts of your program make sure that the data is composed of valid structures (aka, syntactically valid) 3. Some parts of your program make sure that the data is composed of sound information (aka, semantically valid) 4. Some parts of your program then do stuff with that sound information. Mutations from fuzzers like AFL++ work on byte sequences -- i.e., they emit and mutate unstructured data. From the previous perspective, what that means is that they are exercising most the lexical part of the program. As syntactic or semantic complexity increases, you have less and less ability to test the higher segments of the program because you are simply searching it less. So, the next step might be to produce a fuzzer which only produces lexically-valid data. You can do this with AFL++ with dictionaries! In fact, modern AFL++ compilers extract "tokens" from source code to build automatic dictionaries (autodict). Most modern fuzzers now do this. What's interesting here is that AFL++ also retains the mutations which are lexically destructive (i.e., some mutations are still not aware of where tokens are and will corrupt those input regions). But that's actually advantageous, because it will test the code which checks for lexical validity. The next step up would be to produce syntactically valid inputs. For this, you'll need to move up to other fuzzers; tools like grammarinator or Nautilus handle this: https://github.com/renatahodovan/grammarinator (note that these are a bit above context-free, but w/e) Just like with AFL++ fuzzing "just outside" of lexical validity, you want inputs that are "just outside" of syntactic validity. There's some more recent works on that, as well: https://doi.org/10.1145/3605157.3605170 Now, here's the rub: with unstructured and lexical mutations, you reason about the input linearly: you basically work with sequences in either case. But when we use syntactic fuzzers, we reason about the fuzzer in a tree-like format; you have some AST that you mutate rather than some sequence. What this means is that the concept of "distance" changes; two inputs which are structurally similar might be significantly different unstructured. This means that you _actually explore the program differently_; you are literally traversing the program's "behaviour space" (if you want to think about it so abstractly) in a different way. As a result, the way you progress through a program changes, and even if you reach the same coverage, the means by which you reached them is different. As a result, you likely test the same code regions in different ways. See: https://doi.org/10.1145/3742894 Moreover, syntactic fuzzers (i.e., those which mutate on the AST) cannot be easily used in combination with lexical or unstructured fuzzers; the structured nature means that it is very difficult to ensure that a lexical mutation doesn't make the structure just impossible to reason about. When we move up to semantics, the same step happens again. Now, instead of structure, you might need to reason about the relationships between AST nodes -- oftentimes in ways disconnected to the syntax. Consider writing something like a semantic mutator for C programs. Instead of reasoning about AST, you might prefer to reason about mutating typed dataflow graphs, which you then "lower" into AST, which you then "lower" into lexical components, which you then "lower" into unstructured data. For the same reasons as before, you want to produce inputs "just outside" of semantic validity, and the representation of inputs is not compatible with other levels. Different representations of program semantics matter. In the XML case, they changed from semantics of the _input_ to semantics of the _API_. This is significant, because it changes how mutations are applied again. Even stuck with the admittedly very, very limited metric that is coverage, exploring this space with different "input projections" (as I've taken to calling it) is meaningful. This, all together, is why neither one harness nor one fuzzer is enough. For starters, you need different harnesses to test different parts of the program. Then, you need different fuzzers to produce different kinds of data for each harness. Then, at a semantic level, you need different fuzzers to produce different representations of the semantics of the program under test. If you want to get started with semantic fuzzing, IMO the best fuzzer is the one you write to accomplish your task. LibAFL or FANDANGO might help, but you'll very need a home-grown solution. - https://github.com/AFLplusplus/LibAFL - https://github.com/fandango-fuzzer/fandango I've seen some evidence to suggest that fuzzers like FANDANGO (which is syntactic + constraints) might not be able to construct deeply semantically complex inputs vis-a-vis the dataflow example from before, so consider playing with it, but I really suggest making a higher-order input representation + mutations. The most critical takeaway from my talk is one that you've already taken: you _need_ to think about how the fuzzer interacts with the program under test. It sounds like you folks have that covered, though! Do let me know if I can help clarify anything else, or point you to useful resources. -- Addison Crump | PhD Candidate CISPA – Helmholtz-Zentrum für Informationssicherheit gGmbH Stuhlsatzenhaus 5, 66123 Saarbrücken, Deutschland Room 2.17 p +49 681 87083 2876 e [email protected] | w www.cispa.de
