[Bug other/80437] large decimal numbers in diagnostics are hard to read

dmalcolm at gcc dot gnu.org Tue, 01 Aug 2017 13:11:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80437


David Malcolm <dmalcolm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmalcolm at gcc dot gnu.org

--- Comment #1 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
(In reply to Martin Sebor from comment #0)

[...snip...]

> bug.c:11:5: warning: 'memset': specified size 0xfffffffffffffffb exceeds
> maximum object size 0xffffffffffffffff [-Wstringop-overflow=]
> 
> I'm not sure that this a significant improvement.  Those already familiar
> with the -Wstringop-overflow warning will likely understand what
> 0xffffffffffffffff in this context means but only because we know the
> maximum object size limit (i.e., PTRDIFF_MAX) and realize that all printed
> values are in the [PTRDIFF_MAX + 1, SIZE_MAX] range and thus always consist
> of 16 hex digits.  Someone who's seen the warning for the first time will
> either have to guess or count the f's.  This is even more likely for the
> specified size (such as 0xfffffffffffffffb).  In cases where a much lower
> limit is specified by the user (e.g., via -Walloca-larger-than) it's even
> less clear how to interpret a number in any base.
> 
> I think it's possible to do better.  One approach is to print very large
> values in terms of well-known constants such as SIZE_MAX or PTRDIFF_MAX. 
> For instance, instead of printing 18446744073709551611 (i.e., -5) print
> SIZE_MAX - 4.  Another solution might be to print sizes as signed (though
> that won't help in the case of the user-specified limit).

How about printing *both* i.e.:

bug.c:11:5: warning: 'memset': specified size 0xfffffffffffffffb (SIZE_MAX - 4)
exceeds maximum object size 0xffffffffffffffff (PTRDIFF_MAX)
[-Wstringop-overflow=]

(I may have got the expressions wrong, but hopefully the meaning is clear)

> Since the problem of how best to present large decimal numbers is general
> and applies to all diagnostics, including warnings, errors, and notes, a
> change to how these numbers are presented should be brought up for a wider
> discussion before it's implemented consistently, for all diagnostics.

I find large decimal numbers intimidating, and find hexadecimals easier for
values close to large powers of two.

Suggestion: choose base based on a "mental effort cost":

Example 1
*********
For example, if we have an overflow that occurs when x >= 2^31,
which is easier to read:

DECIMAL:
  warning: buffer overflow occurs when x >= 2147483648

HEX:
  warning: buffer overflow occurs when x >= 0x80000000

FORMULA:
  warning: buffer overflow occurs when x >= 2^31

FORMULA and HEX:
  warning: buffer overflow occurs when x >= 2^31 (0x80000000)

Example 2
*********
an overflow that occurs when x >= 100

DECIMAL:
  warning: buffer overflow occurs when x >= 100

HEX:
  warning: buffer overflow occurs when x >= 0x64

In the above case, decimal is the easier-to-read format.

Example 3
*********

an overflow that occurs when x >= 0x7fff0000

DECIMAL:
  warning: buffer overflow occurs when x >= 2147418112

HEX:
  warning: buffer overflow occurs when x >= 0x7fff0000

In this case, hexadecimal is the easier-to-read format.

Example 4
*********

an overflow that occurs when x <= -8000

DECIMAL:
  warning: buffer overflow occurs when x <= -8000

HEX:
  warning: buffer overflow occurs when x <= -0x1f40


The idea
********

The idea is a way to choose the printed representation based on the value,
based on the number of "awkward" digits.

On implementation is to assign a cost to a digit based on closeness to zero.

For example, in decimal,
  '0' : low cost
  '1', '9': medium cost
  '2'..'8': high cost

in hexadecimal:i
  '0' : low cost
  '1', 'f': medium cost
  '2'..'e': high cost

We can weight these, say cost 10 for "high", cost 1 for "medium", cost 0 for
"low".

"Cheaper" in this sense should mean "easier for a human to understand"; a rough
measure of the amount of mental effort required by a human reader.

Hence:

example 1:
  decimal: 2147483648
    10 digits, 9 high cost, 1 medium cost: cost = 91

  hexadecimal: 0x80000000
    8 digits; 1 high cost, 7 low cost: cost = 17

  hence hexadecimal is "cheaper", and we use it

example 2:
  decimal: 100
    3 digits, 1 medium cost, 2 low cost: cost = 1

  hexadecimal: 0x64
    2 high cost digits: cost = 20

  hence decimal is "cheaper", and we use it

example 3:
  decimal: 2147418112
    10 digits: 4 medium cost, 6 high cost: cost = 64

  hexadecimal: 0x7fff0000
    8 digits: 1 high cost, 3 medium cost, 4 low cost: cost = 13

  hence hexadecimal is "cheaper", and we use it

example 4:
  decimal: -8000
    3 low cost digits, 1 high cost: cost = 10

  hexadecimal: -0x1f40
    1 low cost, 2 medium cost, 1 high cost: cost = 12

  hence decimal is "cheaper", and we use it

I guessed at these weightings; there may be better ones.

[Bug other/80437] large decimal numbers in diagnostics are hard to read

Reply via email to