Hi Stuart, hi Mark, hi Jason,

Stuart Henderson wrote on Wed, Mar 12, 2025 at 03:11:27PM +0000:
> On 2025/03/12 14:54, Mark Kettenis wrote:

>> Well, that makes some sort of sense if you interpret the strings as
>> floating point numbers and ignore everything after as garbage.

> GNU's implementation of sort behaves exactly the same with -h and -n,
> their manual says "output only the first of an equal run".
> 
> posix says "suppress all but one in each set of lines having equal
> keys", and their definition of -n fits into that: 
> 
>     Restrict the sort key to an initial numeric string, consisting
>     of optional <blank> characters, optional <hyphen-minus> character,
>     and zero or more digits with an optional radix character and
>     thousands separators (as defined in the current locale), which
>     shall be sorted by arithmetic value. An empty digit string shall
>     be treated as zero. Leading zeros and signs on zeros shall not
>     affect ordering.
> 
>     https://pubs.opengroup.org/onlinepubs/9799919799/utilities/sort.html
> 
> I think our docs could be improved,

In general, the quality of our sort(1) manual does not feel good
to me.  Parts of it look wordy, other parts vague.

See below for a patch to improve some of the aspects related to the
present report.  I do not claim this patch fixes all problems in the
vicinity, but i fear rabbit holes and prefer incremental progress.

> but the -n behaviour seems valid and, importantly, matches the common
> other implementation and does not seem to violate posix.
> 
> -h is of course an extension, but matching -n seems right.

I agree with all of that.

One aspect i still don't understand is the interaction of -n with "-t.",
for example why "sort -n -t. -k1 -k2 -k3 -k4 < test.in" doesn't
work on the input provided by the OP (maybe parsing "." as a decimal
point takes precedence over the "-t." making it a field separator?
I'm not sure).  I'm not sure how the standard expects field splitting
and number parsing to be related to each other.  But one thing at a time,
so here comes my diff:

Rationale:
The main point is that for all the numeric sort options, we need to say
explicitely what the key is, because the key is what the description
of the -u option refers to.

In the order of the patch, the detailed rationale is:
 1. "implies a stable sort (see below)" is just wrong.
    If anything, -s is above -u, not below - but saying that would
    be useless, it's better to just point to -s directly.
 2. Fix -g in a similar way as -n (see below).
 3. "handles general floating points" sounds logically wrong.
    The text isn't talking about multiple points, but multiple numbers.
 4. Fix -h in a similar way as -n (see below).
 5. Fix the cross reference to df(1).
 6. Say what the key is.
 7. Add the missing indefinite article "an optional minus sign".
 8. Avoid needlessly turning the postpositive participle "including"
    into a parenthetic remark.
 0. Add the missing indefinite article to "decimal point".
 10. Clarify that the decimal point is optional.

OK?
  Ingo


Index: sort.1
===================================================================
RCS file: /cvs/src/usr.bin/sort/sort.1,v
diff -u -r1.65 sort.1
--- sort.1      31 Mar 2022 17:27:27 -0000      1.65
+++ sort.1      12 Mar 2025 19:26:15 -0000
@@ -121,7 +121,8 @@
 is not defined.
 .It Fl u , Fl Fl unique
 Unique: suppress all but one in each set of lines having equal keys.
-This option implies a stable sort (see below).
+This option implies
+.Fl s .
 If used with
 .Fl C
 or
@@ -148,24 +149,25 @@
 Consider all lowercase characters that have uppercase
 equivalents to be the same for purposes of comparison.
 .It Fl g , Fl Fl general-numeric-sort , Fl Fl sort Ns = Ns Cm general-numeric
-Sort by general numerical value.
+Use an initial numeric string as the key and sort numerically.
 As opposed to
 .Fl n ,
-this option handles general floating points.
+this option handles general floating point numbers.
 It has a more
 permissive format than that allowed by
 .Fl n
 but it has a significant performance drawback.
 .It Fl h , Fl Fl human-numeric-sort , Fl Fl sort Ns = Ns Cm human-numeric
-Sort by numerical value, but take into account the SI suffix,
-if present.
+Use an initial numeric string with an optional SI suffix as the key.
 Sorts first by numeric sign (negative, zero, or
 positive); then by SI suffix (either empty, or `k' or `K', or one
 of `MGTPEZY', in that order); and finally by numeric value.
 The SI suffix must immediately follow the number.
 For example, '12345K' sorts before '1M', because M is "larger" than K.
 This sort option is useful for sorting the output of a single invocation
-of 'df' command with
+of a
+.Xr df 1
+command with
 .Fl h
 or
 .Fl H
@@ -176,9 +178,9 @@
 Sort by month abbreviations.
 Unknown strings are considered smaller than valid month names.
 .It Fl n , Fl Fl numeric-sort , Fl Fl sort Ns = Ns Cm numeric
-An initial numeric string, consisting of optional blank space, optional
-minus sign, and zero or more digits (including decimal point)
-is sorted by arithmetic value.
+Use an initial numeric string as the key, consisting of optional
+blank space, an optional minus sign, and zero or more digits including
+an optional decimal point, and sort numerically.
 Leading blank characters are ignored.
 .It Fl R , Fl Fl random-sort , Fl Fl sort Ns = Ns Cm random
 Sort lines in random order.

Reply via email to