On Tue, 2021-03-02 at 23:14 +0000, brian.sobulefsky wrote:
> I have been kicking these sorts of ideas around ever since I came to
> understand that
> the second "UNKNOWN" in the for loop that originally started this was
> due to the state
> merge as we loop. For now, and I don't mean this disrespectfully
> because it is very
> hard to get right, the whole issue of merging has basically been
> punted, given some
> of the simple examples we found that will merge as an unknown svalue.
> As you think
> about this issue, "scope creep" becomes a concern quickly. It 
> quickly turns into
> a halting problem of sorts, you have to decide how much of you want
> the analyzer to
> be able to "understand" a program. For example, any human can follow
> this:
> 
> sum = 0;
> for (idx = 1; idx <= 10; idx++) sum += idx;
> __analyzer_eval (sum == 55);
> 
> but from an analyzer perspective it opens up all sorts of questions
> and
> becomes a bit of a PhD thesis as to where you draw the line. 

Challenge accepted!  FWIW with suitable options, the analyzer can
actually "figure this out":

$ cat ../../src/t.c
extern void __analyzer_eval (int);

void test (void)
{
 int sum = 0;
 for (int idx = 1; idx <= 10; idx++)
   sum += idx;
 __analyzer_eval (sum == 55);
}

$ ./xgcc -B. -S -fanalyzer ../../src/t.c \
    -Wanalyzer-too-complex \
    -fno-analyzer-state-merge \
    --param analyzer-max-enodes-per-program-point=11
../../src/t.c: In function ‘test’:
../../src/t.c:8:2: warning: TRUE
    8 |  __analyzer_eval (sum == 55);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~

i.e. with -fno-analyzer-state-merge to disable state merging, 
and increasing the enode limit so that the analyzer effectively fully
unrolls the loop when exploring the exploded_graph.

But obviously this isn't particularly useful, except as a demo of the
internals of the analyzer.


> The biggest concern
> with the analyzer seems to be vulnerabilities, so I doubt it is
> useful to get the
> analyzer to produce the correct answer for the above code, although
> it might be
> interesting to do so from an academic perspective.

Indeed - security vulnerabilities are my highest priority (making it
easier to avoid them as code is written/patched, and to find them in
existing code).

> The example you provided gives a concrete reason that overflow should
> not be a
> complete "punt" and I like it. In the interest of fighting scope
> creep and keeping
> things manageable, I would question whether you want to actually
> track the overflow /
> no overflow cases separately or just raise any possible overflow as
> an error immediately.
> I am not disputing your idea, I would prefer to track the overflow
> and get
> a correct result (in this case, an under allocation of memory). I
> guess I would want
> to know how much work you think that will be. You still know the
> codebase a lot better
> than I do.

Brainstorming somewhat, another idea I have for handling overflow (as
opposed to the || idea you mentioned) might be to bifurcate state at
each point where overflow could occur, splitting the path into "didn't
overflow" and "did overflow" outcomes, adding conditions to each
successor state accordingly.

But maybe that would lead to a combinatorial explosion of nodes (unless
it can be tamed by merging?)

(Unfortunately, we can currently only split states at CFG splits, not
at arbitrary statements; see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99260 )

> My devil's advocate position would be if the analyzer raises
> exception on
> any possible overflow, will that overwhelm the user with false
> positives?

Presumably by "raise exception" you mean "issue a diagnostic and stop
analyzing this path", right?

I think the point is to detect where numerical overflow can lead to
e.g. a buffer overflow, rather than complain about numerical overflow
in its own right, like in the make_arr example I gave earlier.

>  I
> am not sure of the answer here, because a piece of me feels that
> overflow is not
> something that production code should be relying on in any serious
> application,
> and so should be non existent, but I am not sure if that is
> reflective of
> reality.

My belief is that production code is full of overflows, but only some
of them are security-sensitive.  Consider e.g. hashing algorithms that
sum some values and beningly assume overflow for wraparound as opposed
to the "calculate the size of the buffer to be allocated" example
(where the overflow is a classic security pitfall).

> The simplest way to handle your example would be the following:
> 
> struct foo * make_arr (size_t n)
> {
>   if (MAX_INT / sizeof (struct foo) >= n)
>     return NULL;
>   //continue with what you wrote
> }

Ideally we'd emit a fix-it hint suggesting adding such a clause (I'm
kidding, but only partly).

> This should add a constraint that downstream of the initial guard, n
> is small
> enough to prevent overflow (I would have to check, but the current
> analyzer
> should be close to doing this if not already correct). Therefore, all
> we would need
> would be the check at the definition of sz as to whether it
> overflows, and that
> check should come back negative in my example, unknown in yours.
> Assuming we agree
> that the purpose of the analyzer is to prevent vulnerabilities, and
> not to provide
> an academic exercise in seeing how close we get to solving the
> halting problem,
> we could just treat any possible overflow as an error.

Brainstorming again: perhaps a state-machine that flags svalues that
could have overflowed, and complain about "possible-overflowed" svalues
that reach a sink like "malloc"?

> As this was fundamentally your project, I don't want to tell you how
> to do it, and
> the nerd in me wants to build an analyzer that can answer all the
> silly puzzle code
> I can think to feed it, but from a utilitarian perspective and given
> everyone's
> limited time, I thought I would offer this as a path you can try.

Thanks.  As noted above, I want the analyzer to primarily be a
practical tool to help stop security vulnerabilities (though it's fun
to play with puzzle code, too, but pragmatically I want to build
something that helps C/C++ developers).

Dave


Reply via email to