On Fri, Jul 14, 2017 at 11:17 PM, Kostya Serebryany <k...@google.com> wrote:
>>>> > Hi
>>>> >
>>>> > I wrote a test for "-fsanitize-coverage=trace-cmp" .
>>>> >
>>>> > Is there anybody tells me if these codes could be merged into gcc ?
>>>>
>>>>
>>>> Nice!
>>>>
>>>> We are currently working on Linux kernel fuzzing that use the
>>>> comparison tracing. We use clang at the moment, but having this
>>>> support in gcc would be great for kernel land.
>>>>
>>>> One concern I have: do we want to do some final refinements to the API
>>>> before we implement this in both compilers?
>>>>
>>>> 2 things we considered from our perspective:
>>>>  - communicating to the runtime which operands are constants
>>>>  - communicating to the runtime which comparisons are counting loop checks
>>>>
>>>> First is useful if you do "find one operand in input and replace with
>>>> the other one" thing. Second is useful because counting loop checks
>>>> are usually not useful (at least all but one).
>>>> In the original Go implementation I also conveyed signedness of
>>>> operands, exact comparison operation (<, >, etc):
>>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-defs/defs.go#L13
>>>> But I did not find any use for that.
>>>> I also gave all comparisons unique IDs:
>>>> https://github.com/dvyukov/go-fuzz/blob/master/go-fuzz-dep/sonar.go#L24
>>>> That turned out to be useful. And there are chances we will want this
>>>> for C/C++ as well.
>>>>
>>>> Kostya, did anything like this pop up in your work on libfuzzer?
>>>> Can we still change the clang API? At least add an additional argument
>>>> to the callbacks?
>>>
>>>
>>> I'd prefer not to change the API, but extend it (new compiler flag, new
>>> callbacks), if absolutely needed.
>>> Probably make it trace-cmp-guard (similar to trace-pc-guard, with an extra
>>> parameter that has the ID).
>>> I don't like the approach with compiler-generated constant IDs.
>>
>> Yes, if we do it for C/C++, we need to create globals and pass pointer
>> to a global to the callbacks. IDs do not work for C/C++.
>>
>>> Yes, it's a bit more efficient, but much less flexible in presence of
>>> multiple modules, DSOs, dlopen, etc.
>>>
>>> I was also looking at completely inlining this instrumentation because it's
>>> pretty expensive even in it's current form
>>> (adding more parameters will make things worse).
>>> This is going to be much less flexible, of course, so I'll attack it only
>>> once I settle on the algorithm to handle CMPs in libFuzzer.
>>
>> This will require a new, completely different API for
>> compiler<->runtime anyway, so we can put this aside for now.
>>
>>
>>>> At the very least I would suggest that we add an additional arg that
>>>> contains some flags (1/2 arg is a const, this is counting loop check,
>>>> etc). If we do that we can also have just 1 callback that accepts
>>>> uint64's for args because we can pass operand size in the flags:
>>>
>>>
>>> How many flag combinations do we need and do we *really* need them?
>>>
>>> If the number of flag combinations is small, I'd prefer to have separate
>>> callbacks (__sanitizer_cov_trace_cmp_loop_bound ?)
>>>
>>> Do we really need to know that one arg is a const?
>>> It could well be a constant in fact, but compiler won't see it.
>>>
>>> int foo(int n) {   ... if (i < n) ... }
>>> ...
>>> foo(42);  // not inlined.
>>>
>>> We need to handle both cases the same way.
>>
>>
>> Well, following this line of reasoning we would need only
>> __asan_load/storeN callbacks for asan and remove
>> __asan_load/store1/2/4/8, because compiler might not know the size at
>> compile time. Constant-ness is only an optimization. If compiler does
>> not know that something is a const, fine. Based on my experience with
>> go-fuzz and our early experience with kernel, we badly need const
>> hint.
>> Otherwise fuzzer generates gazillions of candidates based on
>> comparison arguments. Note that kernel is several order of magnitude
>> larger than what people usually fuzz in user-space, inputs are more
>> complex and at the same time execution speed is several order of
>> magnitude lower. We can't rely on raw speed.
>>
>> Thinking of this more, I don't thing that globals will be useful in
>> the kernel context (the main problem is that we have multiple
>> transient isolated kernels). If we track per-comparison site
>> information, we will probably use PCs. So I am ready to give up on
>> this.
>>
>> Both of you expressed concerns about performance. Kostya says we
>> should not break existing clang API.
>> If we limit this to only constant-ness, then I think we can make this
>> both forward and backward compatible, which means we don't need to
>> handle it now. E.g. we can:
>>  - if both operands are const (if it's possible at all), don't emit any 
>> callback
>>  - if only one is const, emit __sanitizer_cov_trace_cmp_const1 and
>> pass the const in a known position (i.e. always first/second arg)
>>  - if none are const, emit callback __sanitizer_cov_trace_cmp_dyn1
>> Then compiler emits weak aliases form
>> __sanitizer_cov_trace_cmp_const/dyn1 to the old
>> __sanitizer_cov_trace_cmp1, which makes it backwards compatible.
>> New runtimes that implement __sanitizer_cov_trace_cmp_const/dyn1, also
>> need to provide the old __sanitizer_cov_trace_cmp1 for old compilers.
>
> SGTM++
>
> Although, maybe we can actually break the API this way to keep things
> simpler (no weak aliases)
> * leave __sanitizer_cov_trace_cmp[1248] for general case
> * add __sanitizer_cov_trace_cmp[1248]_const for constant in the 2-nd parameter
> * add __sanitizer_cov_trace_cmp[1248]_loop for loop bound in the 2nd parameter

This would work for us.

> * do we need __sanitizer_cov_trace_cmp[1248]_const_loop ?

We don't know yet. Potentially yes, because const is reading/writing
to a static buffer, while non-const is reading/writing to a buffer
with user-provided size.

Reply via email to