On Fri, 2020-05-08 at 15:35 -0400, Lewis Hyatt wrote:
> On Fri, Jan 31, 2020 at 03:31:59PM -0500, David Malcolm wrote:
> > On Fri, 2020-01-31 at 14:31 -0500, Lewis Hyatt wrote:
> > > Hello-
> > > 
> > > Here is the second patch that I mentioned when I submitted the
> > > other
> > > related
> > > patch (which is awaiting review):
> > > https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01626.html. 
> > 
> > Sorry about that; I'm v. busy with analyzer bugs right now.
> > 
> > > This second patch
> > > is based on top of the first one and it closes out PR49973 and
> > > PR86904 by
> > > adding the new option -fdiagnostics-column-unit=[display|byte].
> > > This
> > > allows
> > > to specify whether columns are output as simple byte counts (the
> > > current
> > > behavior), or as display columns including handling multibyte
> > > characters and
> > > tabs. The patch makes display columns the new default.
> > > Additionally,
> > > a
> > > second new option -fdiagnostics-column-origin is added, which
> > > allows
> > > to make
> > > the column 0-based (or N-based for any N) instead of 1-based. The
> > > default
> > > remains at 1-based as it is now.
> > > 
> > > A number of testcases were explicitly testing for the old
> > > behavior,
> > > so I
> > > have updated them to test for the new behavior instead, since the
> > > column
> > > number adjusted for tabs is more natural to test for, and matches
> > > what
> > > editors typically show (give or take 1 for the origin
> > > convention).
> > > 
> > > One other testcase (go.dg/arrayclear.go) was a bit of an oddity.
> > > It
> > > failed
> > > after this patch, although it doesn't test for any column
> > > numbers.
> > > The
> > > answer turned out to be, this test checks for identical error
> > > text on
> > > two
> > > different lines. When the column units are changed to display
> > > columns, then
> > > the column of the second error happens to match the line of the
> > > first
> > > one. dejagnu then misinterprets the second error as if it matched
> > > the
> > > location of the first one (it doesn't distinguish whether it
> > > checks
> > > for the
> > > line number or the column number in the output). I added a
> > > comment to
> > > the
> > > test explaining the situation; since adding the comment has the
> > > side
> > > effect
> > > of making the first line number no longer match the second column
> > > number, it
> > > also makes the test pass again.
> > > 
> > > It wasn't quite clear to me whether this change was appropriate
> > > for
> > > GCC 10
> > > or not at this point. We discussed it a couple months ago here:
> > > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02171.html. Either
> > > way,
> > > I hope
> > > it isn't a problem that I submitted the patch for review now,
> > > whether
> > > it
> > > will end up in 10 or 11. Please let me know what's normally
> > > expected?
> > > Thanks!
> > 
> > Thanks Lewis.
> > 
> > This patch looks very promising, but should wait until gcc 11;
> > we're
> > trying to stabilize gcc 10 right now (I'm knee-deep in analyzer
> > bug-
> > fixing, so I don't want to add any more diagnostics changes).
> > 
> 
> Hi Dave-
> 
> Well GCC 10 was released for a whole day so I thought I would bug you
> with this
> patch again now :). To summarize, I previously sent this in two
> separate parts.
> 
> Part 1: 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg01626.html
> Part 2: 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg02108.html
> 
> Part 1 added the support for converting tabs to spaces when
> outputting
> diagnostics. Part 2 added the new options -fdiagnostics-column-unit
> and
> -fdiagnostics-column-origin to control whether the column number is
> printed
> in display or byte units. Together they resolve both PR49973 and
> PR86904.
> 
> You provided me with feedback on part 2, which is quoted below with
> some
> notes interspersed. The new version of the patch incorporates all of
> your
> suggestions. Part 1 has not changed other than some trivial rebasing
> conflicts. The two patches touch nearly disjoint sets of files and
> are
> logically linked together, so I thought it would be simpler if I just
> sent
> one combined patch now. If you prefer them to be separated as before,
> please
> let me know and I can send them that way as well.
> 
> Bootstrap and reg tests were done on x86-64 Linux for all
> languages.  Tests
> look good:
> 
> type, before, after
> FAIL 96 96
> PASS 474637 475097
> UNSUPPORTED 11607 11607
> UNTESTED 195 195
> XFAIL 1816 1816
> XPASS 36 362

Thanks for the patch; sorry about the delay in reviewing it.

Some high-level review points

- I like the patch overall

- This will deserve an item in the release notes

- I don't like adding "global_tabstop" (I don't like global
variables).  Is there nowhere else we can handle this? I believe
there's a cluster of functions in the callgraph that make use of
it; can we simply pass around the tabstop value instead?  "tabstop"
seems to have several meanings.  If I'm reading the patch correctly
  * "tabstop > 0" means to expand tabs so that column numbers are a
multiple of tabstop
  * "tabstop == 0" means "don't expand tabs"
  * "tabstop < 0" in some places means: use the global_tabstop value
Is it possible to eliminate global_tabstop value?  Or is there some
deep reason I'm missing?

I'll do a more thorough review once that's addressed/resolved (since
eliminating global_tabstop might touch a few places).

Thanks for adding docs; some nits on them:

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi

[...snip...]

> +@item -fdiagnostics-column-unit=@var{UNIT}
> +@opindex fdiagnostics-column-unit
> +Select the units for the column number.  This affects traditional diagnostics
> +(in the absence of @option{-fno-show-column}), as well as JSON format
> +diagnostics if requested.
> +
> +The default @var{UNIT}, @samp{display}, considers the number of display 
> columns
> +occupied by each character.  This may be larger than the number of bytes
> +occupied, in the case of tab characters, or it may be smaller, in the case of
> +multibyte characters.  For example, the UTF-8 character ``@U{03C0}'' occupies
> +two bytes and one display column, while the character ``@U{1F642}'' occupies
> +four bytes and two display columns.

This is imprecise.  A unicode code point occupies some number of display 
columns,
and its *UTF-8 encoding* occupies some number of bytes.

[and my inner pedant is now thinking: what about combining diacritics? 
But I don't think we can ever issue a diagnostic on a diacritic; I
*think* we only ever care about the per-glyph level]

> +Setting @var{UNIT} to @samp{byte} changes the column number to the
raw byte
> +count in all cases, as was traditionally output by GCC prior to version 
> 11.1.0.
> +
> +@item -fdiagnostics-column-origin=@var{ORIGIN}
> +@opindex fdiagnostics-column-origin
> +Select the origin for column numbers, i.e. the column number assigned to the
> +first column.  The default value of 1 corresponds to traditional GCC
> +behavior and to the GNU style guide.  Some utilities may perform better with 
> an
> +origin of 0; any non-negative value may be specified.
> +
>  @item -fdiagnostics-format=@var{FORMAT}
>  @opindex fdiagnostics-format
>  Select a different format for printing diagnostics.

[...snip...]

> +A diagnostic can contain zero or more locations.  Each location has an
> +optional @code{label} string and up to three positions within it: a
> +@code{caret} position and optional @code{start} and @code{finish} positions.
> +A position is described by a @code{file} name, a @code{line} number, and
> +three numbers indicating a column position: @code{display-column} counts
> +display columns, accounting for tabs and multibyte characters;
> +@code{byte-column} counts raw bytes; and @code{column} is equal to one of
> +the previous two, as dictated by the @option{-fdiagnostics-column-unit}
> +option.

Might be clearer to use an unordered list here for the three kinds of column.

> All three columns are relative to the origin specified by
> +@option{-fdiagnostics-column-origin}, which is typically equal to 1 but may
> +be set, for instance, to 0 for compatibility with other utilities that
> +number columns from 0.  The column origin is recorded in the JSON output in
> +the @code{column-origin} tag.  In the remaining examples below, the extra
> +column number outputs have been omitted for brevity.

[...snip...]


Thanks again for the patch; hope this is constructive
Dave

Reply via email to