On 11/10/2021 00:34, Paul Eggert wrote:
The warnings look good, except that this one:

    $ printf '1.0\n0.9\n' | sort -s -k1,1g --debug
    sort: numbers use ‘.’ as a decimal point in this locale
    0.9
    ___
    1.0
    ___

seems overkill if we're in the C locale.

Also, shouldn't similar diagnostics be generated if the field separator
is '-', or '+', or a digit in the current locale?

Yes this is more informational than a warning.
As Bernhard mentioned it may be useful to tag
--debug messages as informational or warnings.

In this case it would be info:
but would change to warn: if (tab == decimal_point).

The reason for the info message is that it may not
be obvious to users that numeric comparison
depends on locale just like text,
and we already provide the informational
text comparison message indicating the current locale.
We would only show this info: message if doing numeric sorting.

+  if (numeric_field_span)
+    {
+      char sep_string[2] = { 0, };
+      sep_string[0] = thousands_sep;
+      if ((tab == TAB_DEFAULT
+           && (isblank (to_uchar (thousands_sep))))
+          || tab == thousands_sep)
+        {
+          error (0, 0,
+                 _("field separator %s is treated as a "
+                   "group separator in numbers"),
+                 quote (sep_string));
+          number_locale_warned = true;
+        }
+    }

This code brought it to my attention that the GNU 'sort' has had a
longstanding bug (in code that I wrote long ago - sorry!) in that
thousands_sep is either -1 or an unsigned char converted to int, and
this doesn't work in some unusual cases. I installed the attached patch
to fix that bug, and I vaguely suspect that it fixes similar bugs in GNU
'test' and GNU 'expr'. Good thing you brought it to my attention.
(Sorry, I'm too lazy and/or time-pressed and/or overconfident to write
test cases....)

I'd noted this and was going to follow up on it.
Thanks for sorting it out!

Anyway, with that patch installed, TAB and THOUSANDS_SEP can both be
CHAR_MAX + 1 so the above code needs to be twiddled. Also, we can assume
C99. So, something like following (pardon Thunderbird's line wrap):

    if (numeric_field_span
        && (tab == TAB_DEFAULT
          ? thousands_char != NON_CHAR && isblank (to_uchar (thousands_sep))
          : tab == thousands_sep))
      {
        error (0, 0,
             _("field separator %s is treated as a group separator in numbers"),
             quote (((char []) {thousands_sep, 0})));
        number_locale_warned = true;
      }

with a similar replacement to the decimal_point code.

cheers,
Pádraig



Reply via email to