Re: Constraints and branching in -fanalyzer

brian.sobulefsky via Gcc Sat, 20 Feb 2021 21:27:36 -0800

To be clear, I only solved the lesser problem

if(idx-- > 0)
  __analyzer_eval(idx >= 0);


which is a stepping stone problem. I correctly surmised that this was failing
(with the prefix operator and -= operator working as expected) because the
condition that is constrainted in the postfix problem is the old value for idx
while the condition being evaluated is the new value. I can send you a patch,
but the short version is the initial value of idx is constrained, then a 
binop_svalue
is stored and eventually ends up in __analyzer_eval. Adding a case in
constraint_manager::eval_condition to take apart binop svalues and recur
the way you are imagining would be necessary is basically all that is needed
to solve that one. Currently, the constraint_manager is just looking at
that binop_svalue and determining it does not know any rules for it, because
the rule it knows about is actually for one of its arguments.

I was hoping this would be it for the original loop problem, but like I said,
it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I
found out happens for the other operators in a for loop as well. The first
true is my binop_svalue handler, but the second UNKNOWN is the merging of
the bottom of the loop back with the entry point.

Since that is fairly abstract, when I found the case I told you about,
I decided to see if I could fix it, because merging >0 with =0 into >=0
for a linear CFG should not be too hard.



Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, February 20, 2021 12:42 PM, David Malcolm <dmalc...@redhat.com> 
wrote:

> [Moving this discussion from offlist to the GCC mailing list (with
> permission) and tweaking the subject]
>
> On Sat, 2021-02-20 at 02:57 +0000, brian.sobulefsky wrote:
>
> > Yeah, its a lot to take in. For the last one, it was just about
> > storing and retrieving data and I ignored everything else about the
> > analyzer, and that was hard enough.
>
> Well done on making it this far; I'm impressed that you're diving into
> some of the more difficult aspects of this code, and seem to be coping.
>
> > I am working on PR94362, which originates from a false positive found
> > compiling openssl. It effectivly amounted to not knowing that idx >=
> > 0 within the loop for(; idx-- >0 ;).
> > It turns out there are two problems here. One has to do with the
> > postfix operator, and yes, the analyzer currently does not know that
> > i >= 0 within an if block like if(idx-- > 0). That problem was easy
> > and I got it to work within a few days with a relatively simple
> > patch. I thought I was going to be submitting again.
> > The second part is hard. It has to do with state merging and has
> > nothing to do with the postfix operator. It fails for all sorts of
> > operators when looping. In fact, the following fails:
> > if(idx < 0)
> >   idx = 0;
> > __analyzer_eval(idx >= 0);
> > which is devastating if you are hoping the analyzer can "understand"
> > a loop. Even with my first fix (which gives one TRUE when run on a
> > for loop), there is the second "iterated" pass, which ends up with a
> > widening_svalue (I'm still trying to wrap my head around that one
> > too), that gives an UNKNOWN
>
> FWIW "widening" in this context is taken from abstract interpretation;
> see e.g. the early papers by Patrick and Radhia Cousot; the idea is to
> jump ahead of an infinitely descending chain of values to instead go
> directly to a fixed point in a (small) finite number of steps. (I've
> not attempted the narrowing approach, which refines it further to get a
> tighter approximation).
>
> > So I am trying to follow how states are merged, but that means I need
> > to at least sort of understand the graphs. I do know that the actual
> > merging follows in the PK_AFTER_SUPERNODE branch, with the call to
> > node->on_edge, which eventually gets you to maybe_update_for_edge and
> > the for_each_fact iterator.
>
> I have spent far too many hours poring over graph dumps from the
> analyzer, and yes, grokking the state merging is painful, and I'm sure
> there are many bugs.
>
> Are you familiar with the various dump formats for the graph? In
> particular the .dot ones? FWIW I use xdot.py for viewing them:
> https://github.com/jrfonseca/xdot.py
> (and indeed am the maintainer of the Fedora package for it); it has a
> relatively quick and scalable UI for navigating graphs, but at some
> point even it can't cope.
> I started writing a dedicated visualizer that uses some of xdot.py's
> classes:
> https://github.com/davidmalcolm/gcc-analyzer-viewer
> but it's early days for that.
>
> > I watched a merge in the debugger yesterday for the if example above
> > and watched the unknown_svalues be made for the merged state, but it
> > was still too much to take in all at once for me to know where the
> > solution is.
>
> One other nasty problem with the state merging code is that any time I
> touch it, there are knock-on effects where other things break (e.g.
> loop analysis stops converging), and as I fix those, yet more things
> break, which is demoralizing (3 steps forward, 2 steps back).
>
> Finding ways to break problems down into smaller chunks seems to be the
> key here.
>
> It sounds like you're making progress with the:
>
> if (idx < 0)
> idx = 0;
> __analyzer_eval (idx >= 0);
>
> case. Does your fix at least work outside of a loop, without
> regressing things? (Or, if it does, what regresses?) If so, then it
> could be turned into a minimal testcase and we could at least fix that.
>
> FWIW I've experimented with better ways to handle loops in the
> analyzer. One approach is that GCC already has its own loop analysis
> framework. At the point where -fanalyzer runs, the IR has captured the
> nesting structure of loops in the code, so we might want to make use of
> that in our heuristics. Sadly, as far as I can tell, any attempts to
> go beyond that and reuse GCC's scalar-value-evolution code (for
> handling loop bounds and iterations) require us to enable modification
> of the CFGs, which is a no-no for -fanalyzer.
>
> (Loop handling is one of the most important and difficult issues within
> the analyzer implementation. That said, in the last few days I've been
> ignoring it, and have been focusing instead on a rewrite of how we find
> the shortest feasibile path for each diagnostic, since there's another
> cluster of analyzer bugs relating to false negatives in that; I hope to
> land a big fix for feasibilty handling next week - and to finish
> reviewing your existing patch (sorry!))
>
> Hope this is helpful. I'm quoting the rest of our exchange below (for
> the mailing list)
>
> Dave
>
> > Sent with ProtonMail Secure Email.
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Friday, February 19, 2021 5:48 PM, David Malcolm <
> > dmalc...@redhat.com> wrote:
> >
> > > On Sat, 2021-02-20 at 00:40 +0000, brian.sobulefsky wrote:
> > >
> > > > Without asking you to give a lecture on what is effectively in
> > > > the
> > > > source, can you give a 30,000 ft. view on your intent between the
> > > > "supers" and the "explodeds" (i.e. supergraph, supernode,
> > > > superedge
> > > > vs. exploded graph, exploded node, exploded edge). I understand
> > > > that
> > > > each are a derived class from your dgraph one directory level up,
> > > > I
> > > > mean, what was your design intent.
> > >
> > > Does:
> > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html
> > > answer your question?
> > > The main point of the supergraph is to combine all of the CFGs into
> > > one
> > > big directed graph.
> > > The point of the exploded graph is to combine that with dataflow,
> > > or,
> > > at least, "interesting" aspects of data flow, where "interesting"
> > > roughly means "differences in state-machine state".
> > > IIRC, both of these are from
> > > "Precise Interprocedural Dataflow Analysis via Graph Reachability"
> > > (Thomas Reps, Susan Horwitz and Mooly Sagiv),
> > > though I'm not using the algorithm they're using, more the high-
> > > level
> > > ideas of the two above paragraphs.
> > > Having an exploded graph means I can generate sequences of events
> > > by
> > > considering paths through the exploded graph.
> > > Does this clarify things?
> > > (BTW, yes; I've seen your latest patch; I hope to get to it
> > > tomorrow)
> > > Dave
> > >
> > > > Sent with ProtonMail Secure Email.
> > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > On Tuesday, February 16, 2021 3:10 PM, David Malcolm <
> > > > dmalc...@redhat.com> wrote:
> > > >
> > > > > On Tue, 2021-02-16 at 22:15 +0000, brian.sobulefsky wrote:
> > > > >
> > > > > > Hi David,
> > > > > > I am having a hard time understanding how the
> > > > > > constraint_managers
> > > > > > update with various types of branching. Usually I can figure
> > > > > > things
> > > > > > out by following it at a particular memory location (an
> > > > > > svalue or
> > > > > > region at a particular memory location never changes kind and
> > > > > > can
> > > > > > be
> > > > > > read later and get the value stored there). I am being
> > > > > > defeated
> > > > > > by
> > > > > > the constraint_manager. I will watch two equivalence classes
> > > > > > and
> > > > > > a
> > > > > > constraint be added and then on the next pass see that there
> > > > > > are
> > > > > > no
> > > > > > classes or constraints in that manager. Is there an obvious
> > > > > > reason
> > > > > > that this would happen?
> > > > >
> > > > > constraint_manager instances get copied a lot: typically at
> > > > > each
> > > > > new
> > > > > state you're working with a fresh instance that got created via
> > > > > the
> > > > > copy constructor from the previous state.
> > > > >
> > > > > > FWIW, I was expecting the way it worked to be that when you
> > > > > > reach
> > > > > > a
> > > > > > branch at say, an if or a for loop condition, two managers
> > > > > > are
> > > > > > created, one for the condition and one for the negation of
> > > > > > the
> > > > > > condition. Is this be how it works?
> > > > >
> > > > > Roughly. The constraint_manager instances are "owned" by
> > > > > region_model
> > > > > instances, themselves within program_state instances.
> > > > > At a branch, within exploded_graph::process_node, for
> > > > > PK_AFTER_SUPERNODE it iterates through successor supernodes,
> > > > > creating a
> > > > > (next_point, next_state) pair for each outedge, and calls
> > > > > exploded_node::on_edge on them.
> > > > > This in turn calls program_state::on_edge, which applies the
> > > > > edge
> > > > > flags
> > > > > to the program state, and, in particular, the
> > > > > constraint_manager;
> > > > > either adding constraints, or rejecting the edge as infeasible.
> > > > > At
> > > > > an
> > > > > if statement, one edge will have EDGE_FLAG_TRUE, the other
> > > > > EDGE_FLAG_FALSE.
> > > > > The output of the supergraph .dot dump shows the edge flags.
> > > > > Hope that makes sense.
> > > > > (one drawback of this approach is we can't bifurcate state
> > > > > within a
> > > > > node, only at edges. Ideally I'd like to eventually fix that,
> > > > > but
> > > > > it
> > > > > would be a big reworking)
> > > > >
> > > > > > Thanks,
> > > > > > Brian
> > > > > > Sent with ProtonMail Secure Email.

Re: Constraints and branching in -fanalyzer

Reply via email to