hi. reads fine to me, ingo. ok. jmc
On 12 March 2025 19:33:35 GMT, Ingo Schwarze <schwa...@usta.de> wrote: >Hi Stuart, hi Mark, hi Jason, > >Stuart Henderson wrote on Wed, Mar 12, 2025 at 03:11:27PM +0000: >> On 2025/03/12 14:54, Mark Kettenis wrote: > >>> Well, that makes some sort of sense if you interpret the strings as >>> floating point numbers and ignore everything after as garbage. > >> GNU's implementation of sort behaves exactly the same with -h and -n, >> their manual says "output only the first of an equal run". >> >> posix says "suppress all but one in each set of lines having equal >> keys", and their definition of -n fits into that: >> >> Restrict the sort key to an initial numeric string, consisting >> of optional <blank> characters, optional <hyphen-minus> character, >> and zero or more digits with an optional radix character and >> thousands separators (as defined in the current locale), which >> shall be sorted by arithmetic value. An empty digit string shall >> be treated as zero. Leading zeros and signs on zeros shall not >> affect ordering. >> >> https://pubs.opengroup.org/onlinepubs/9799919799/utilities/sort.html >> >> I think our docs could be improved, > >In general, the quality of our sort(1) manual does not feel good >to me. Parts of it look wordy, other parts vague. > >See below for a patch to improve some of the aspects related to the >present report. I do not claim this patch fixes all problems in the >vicinity, but i fear rabbit holes and prefer incremental progress. > >> but the -n behaviour seems valid and, importantly, matches the common >> other implementation and does not seem to violate posix. >> >> -h is of course an extension, but matching -n seems right. > >I agree with all of that. > >One aspect i still don't understand is the interaction of -n with "-t.", >for example why "sort -n -t. -k1 -k2 -k3 -k4 < test.in" doesn't >work on the input provided by the OP (maybe parsing "." as a decimal >point takes precedence over the "-t." making it a field separator? >I'm not sure). I'm not sure how the standard expects field splitting >and number parsing to be related to each other. But one thing at a time, >so here comes my diff: > >Rationale: >The main point is that for all the numeric sort options, we need to say >explicitely what the key is, because the key is what the description >of the -u option refers to. > >In the order of the patch, the detailed rationale is: > 1. "implies a stable sort (see below)" is just wrong. > If anything, -s is above -u, not below - but saying that would > be useless, it's better to just point to -s directly. > 2. Fix -g in a similar way as -n (see below). > 3. "handles general floating points" sounds logically wrong. > The text isn't talking about multiple points, but multiple numbers. > 4. Fix -h in a similar way as -n (see below). > 5. Fix the cross reference to df(1). > 6. Say what the key is. > 7. Add the missing indefinite article "an optional minus sign". > 8. Avoid needlessly turning the postpositive participle "including" > into a parenthetic remark. > 0. Add the missing indefinite article to "decimal point". > 10. Clarify that the decimal point is optional. > >OK? > Ingo > > >Index: sort.1 >=================================================================== >RCS file: /cvs/src/usr.bin/sort/sort.1,v >diff -u -r1.65 sort.1 >--- sort.1 31 Mar 2022 17:27:27 -0000 1.65 >+++ sort.1 12 Mar 2025 19:26:15 -0000 >@@ -121,7 +121,8 @@ > is not defined. > .It Fl u , Fl Fl unique > Unique: suppress all but one in each set of lines having equal keys. >-This option implies a stable sort (see below). >+This option implies >+.Fl s . > If used with > .Fl C > or >@@ -148,24 +149,25 @@ > Consider all lowercase characters that have uppercase > equivalents to be the same for purposes of comparison. > .It Fl g , Fl Fl general-numeric-sort , Fl Fl sort Ns = Ns Cm general-numeric >-Sort by general numerical value. >+Use an initial numeric string as the key and sort numerically. > As opposed to > .Fl n , >-this option handles general floating points. >+this option handles general floating point numbers. > It has a more > permissive format than that allowed by > .Fl n > but it has a significant performance drawback. > .It Fl h , Fl Fl human-numeric-sort , Fl Fl sort Ns = Ns Cm human-numeric >-Sort by numerical value, but take into account the SI suffix, >-if present. >+Use an initial numeric string with an optional SI suffix as the key. > Sorts first by numeric sign (negative, zero, or > positive); then by SI suffix (either empty, or `k' or `K', or one > of `MGTPEZY', in that order); and finally by numeric value. > The SI suffix must immediately follow the number. > For example, '12345K' sorts before '1M', because M is "larger" than K. > This sort option is useful for sorting the output of a single invocation >-of 'df' command with >+of a >+.Xr df 1 >+command with > .Fl h > or > .Fl H >@@ -176,9 +178,9 @@ > Sort by month abbreviations. > Unknown strings are considered smaller than valid month names. > .It Fl n , Fl Fl numeric-sort , Fl Fl sort Ns = Ns Cm numeric >-An initial numeric string, consisting of optional blank space, optional >-minus sign, and zero or more digits (including decimal point) >-is sorted by arithmetic value. >+Use an initial numeric string as the key, consisting of optional >+blank space, an optional minus sign, and zero or more digits including >+an optional decimal point, and sort numerically. > Leading blank characters are ignored. > .It Fl R , Fl Fl random-sort , Fl Fl sort Ns = Ns Cm random > Sort lines in random order.