On 07/24/2018 11:05 AM, David Malcolm wrote:
On Mon, 2018-07-23 at 20:56 -0600, Martin Sebor wrote:
On 07/23/2018 07:20 PM, David Malcolm wrote:
On Mon, 2018-07-23 at 17:49 -0600, Martin Sebor wrote:
(David, I'm hoping your your help here.  Please see the end.)

While looking into a recent -Warray-bounds instance in Glibc
involving inlining of large functions it became apparent that
GCC could do a better job of pinpointing the source of
the problem.

The attached patch makes a few adjustments to both
the pretty printer infrastructure and to VRP to make this
possible.  The diagnostic pretty printer already has directives
to print the inlining context for both tree and gcall* arguments,
so most of the changes just adjust things to be able to pass in
gimple* argument instead.

The only slightly interesting change is to print the declaration
to which the out-of-bounds array refers if one is known.

Tested on x86_64-linux with one regression.

The regression is in the gcc.dg/Warray-bounds.c test: the column
numbers of the warnings are off.  Adding the %G specifier to
the array bounds warnings in VRP has the unexpected effect of
expanding the extent of the underling. For instance, for a test
case like this:

   int a[10];

   void f (void)
   {
     a[-1] = 0;
   }

from the expected:

    a[-1] = 0;
    ~^~~~

to this:

   a[-1] = 0;
    ~~~~~~^~~

David, do you have any idea how to avoid this?

Are you referring to the the various places in your patch (in e.g.
  vrp_prop::check_array_ref
  vrp_prop::check_mem_ref
  vrp_prop::search_for_addr_array
) where the patch changed things from this form:

  warning_at (location, OPT_Warray_bounds,
              "[...format string...]", ARGS...);

to this form:

  warning_at (location, OPT_Warray_bounds,
              "%G[...format string...]", stmt, ARGS...);

Yes.


If so, there are two location_t values of interest here:
(a) the "location" value, and
(b) gimple_location (stmt)

My recollection is that %G and %K override the "location" value
passed
in as the first param to the diagnostic call, overwriting it within
the
diagnostic_info's text_info with the location value from the %K/%G
(which also set up the pp_ti_abstract_origin of the text_info from
the
block information stashed in the ad-hoc data part of the location,
so
that the pretty-printer prints the inlining chain).

Would having the pretty printer restore the location and
the block after it's done printing the context and before
processing the rest of the format string fix it?  (I have
only a vague idea how this all works so I'm not sure if
this even makes sense.)

Structurally, it looks like this:

Temporaries during the emission of   |  Long-lived stuff:
the diagnostic:                      |
                                     |    +---------+
+----------------+                   |    |global_dc|
|diagnostic_info |                   |    +---------+
|+------------+  |                   |
||text_info:  |  |                   |
||  m_richloc-+--+---> rich_location |
||  x_data----+--+-------------------+--> block (via pp_ti_abstract_origin)
|+------------+  |                   |
+----------------+                   |
                                     |

The location_t of the diagnostic is stored in the rich_location.

Calling:
  warning_at (location)
creates a rich_location wrapping "location" and uses it as above.

During formatting, the %K/%G codes set text_info.x_data via
pp_ti_abstract_origin and overwrite the location_t in the
rich_location.

So in theory we could have a format code that sets the block and
doesn't touch the rich_location.  But that seems like overkill to me.

I wasn't thinking of a new format.  Rather, I thought the %K
would save the current block and location (set by the location
argument to warning_at), then after printing the inlining stack
but before printing the rest of the diagnostic the printer would
restore the saved block and location.  I still don't know enough
to tell if it would work.

In any event, if it's easier to always print the inlining stack
and get rid of %K and %G then that would be preferable.  I don't
think they are used for any other purpose (i.e., they are always
used as the first directive in a format string).

[aside, why don't we always just print the inlining chain?  IIRC,
%K
and %G feel too much like having to jump through hoops to me, given
that gimple_block is looking at gimple_location anyway, why not
just
use the location in the location_t's ad-hoc data; I have a feeling
there's a PR open about this, but I don't have it to hand right
now].

That would make sense to me.  I think that's also what we
agreed would be the way forward the last time we discussed
this.

(nods)

So how do we go about making this happen?  Somewhat selfishly
I was sort of waiting for you to take the lead on it since
you're much more familiar with the code than I am :)  But
that doesn't mean I can't try to tackle it myself.  If it seems
like something you can fit into your schedule sometime soonish
let me know.  Otherwise I'll see if I can figure it out myself.
In the meantime, I'll proceed with the rest of this patch sans
the %G bits.

It looks like the old location was (a), and now you're seeing (b),
since (b) is the location of the statement.

Yes, I think that's it.


Whether or not this is a problem is difficult to tell; what does
the
full diagnostic look like? (you only quoted the
diagnostic_show_locus
part of it).

It looks like this:

e.c: In function ‘f’:
e.c:5:9: warning: array subscript -1 is below array bounds of
‘int[10]’
[-Warray-bounds]
    a[-1] = 0;
    ~~~~~~^~~
e.c:1:5: note: while referencing ‘a’
  int a[10];
      ^

FWIW, either of the locations/underlining above seem fine to me,

The underling in the warning isn't right.  It doesn't matter so
much in the straightforward case of a one-dimensional array but
it becomes a problem when the warning involves a mult-dimensional
array.  Contrast the output for this code:

  const int i0 = 5, k0 = 0, i1 = 0, k1 = 5;

  int a[2][3];

  void f (void)
  {
    a[i0][k0] = 0;
    a[i1][k1] = 0;
  }

With my patch (without the note) we get:

e.c:7:13: warning: array subscript 5 is above array bounds of ‘int[2][3]’ [-Warray-bounds]
     a[i0][k0] = 0;
     ~~~~~~~~~~^~~
e.c:8:13: warning: array subscript 5 is above array bounds of ‘int[3]’ [-Warray-bounds]
     a[i1][k1] = 0;
     ~~~~~~~~~~^~~

when without the patch we get:

e.c:7:4: warning: array subscript 5 is above array bounds of ‘int[2][3]’ [-Warray-bounds]
     a[i0][k0] = 0;
     ~^~~~
e.c:8:8: warning: array subscript 5 is above array bounds of ‘int[3]’ [-Warray-bounds]
     a[i1][k1] = 0;
     ~~~~~^~~~

With the patch the warning doesn't clearly indicate which of
the two indices is out of bounds.  I don't think we should
lose that feature.

but I
feel the wording of the diagnostic could be improved.  When I first
read:

  array subscript -1 is below array bounds of ‘int[10]’

I found myself pausing to think about the message; the "10" in the
message confused me for a moment, and then I realized that it's
referring to the *lower* bound of the array.  I don't know how
"standardese" refers to this, but I think about it as the "start" of
the array.  So maybe the messages could read something like:

  out of bounds access to array ‘int[10]’: subscript -1 is before the
start of the array

I agree that the warnings could be tweaked.  With the inclusion
of the value of the out-of-bounds index (or its range) and with
the type of the referenced array in the diagnostic it's no longer
necessary to mention whether the index is above or below bounds.
It's enough to say that it's out of bounds.  It will not only
obviate questions like the one above but also simplify the code
(and also simplify writing tests).


or something.  In a similar vein, is it easy to distinguish reads from
writes, which might give:

  out of bounds write to array ‘int[10]’: subscript -1 writes to before
the start of the array

vs e.g.:

  out of bounds read from array 'int[10]': subscript 10 reads from
after 'a[9]', the final element of the array

and similar (I'm brainstorming here).

Thanks for your comments -- they're always helpful!

Martin

Reply via email to