On 10/07/19 19:57, Assaf Gordon wrote:
> Hello all,
> 
> I would like to suggest adding a new chapter to the manual,
> detailing the nitty-gritties of "version sort" in coreutils.
> 
> Attached suggested patch.
> 
> The new sections are:
> 
> 30 Version sort ordering
>   30.1 Version sort overview
>     30.1.1 Using version sort in GNU Coreutils
>     30.1.2 Origin of version sort and differences from natural sort
>     30.1.3 Correct/Incorrect ordering and Expected/Unexpected results
>   30.2 Implementation Details
>     30.2.1 Version-sort ordering rules
>     30.2.2 Version sort is not the same as numeric sort
>     30.2.3 Punctuation Characters
>     30.2.4 Punctuation Characters vs letters
>     30.2.5 Tilde ‘~’ character
>     30.2.6 Version sort uses ASCII order, ignores locale, unicode characters
>   30.3 Differences from the official Debian Algorithm
>     30.3.1 Minus/Hyphen ‘-’ and Colons ‘:’ characters
>     30.3.2 Additional hard-coded priorities In GNU coreutils’ version sort
>     30.3.3 Special handling of file extensions
>   30.4 Advanced Topics
>     30.4.1 Comparing two strings using Debian’s algorithm
>     30.4.2 Reporting bugs or incorrect results
>     30.4.3 Other version/natural sort implementations
>     30.4.4 Related Source code

This looks great.
A few adjustments attached.

many thanks,
Pádraig

>From 553d6d6c95af2fe89ec93558b46e699122c5deca Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Thu, 11 Jul 2019 15:34:17 +0100
Subject: [PATCH] doc: adjustments to version sorting docs

* doc/sort-version.texi: ...
---
 doc/sort-version.texi | 71 ++++++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 38 deletions(-)

diff --git a/doc/sort-version.texi b/doc/sort-version.texi
index e5f8fb7..e2a4283 100644
--- a/doc/sort-version.texi
+++ b/doc/sort-version.texi
@@ -1,4 +1,4 @@
-@c GNU Verion-sort ordering documentation
+@c GNU Version-sort ordering documentation
 
 @c Copyright (C) 2019 Free Software Foundation, Inc.
 
@@ -19,10 +19,10 @@
 @node Version sort overview
 @section Version sort overview
 
-@dfn{version sort} ordering (and simiarly, @dfn{natural sort}
+@dfn{version sort} ordering (and similarly, @dfn{natural sort}
 ordering) is a method to sort items such as file names and lines of
 text in an order that feels more natural to people, when the text
-contain a mixture of letters and digits.
+contains a mixture of letters and digits.
 
 Standard sorting usually does not produce the order that one expects
 because comparisons are made on a character-by-character basis.
@@ -38,13 +38,13 @@ a13                          a13
 a2                           a120
 @end example
 
-version sort funtionality in GNU coreutils is available in the @samp{ls -v},
+version sort functionality in GNU coreutils is available in the @samp{ls -v},
 @samp{ls --sort=version}, @samp{sort -V}, @samp{sort --version-sort} commands.
 
 
 
-@node Using version sort in GNU Coreutils
-@subsection Using version sort in GNU Coreutils
+@node Using version sort in GNU coreutils
+@subsection Using version sort in GNU coreutils
 
 Two GNU coreutils programs use version sort: @command{ls} and @command{sort}.
 
@@ -113,8 +113,8 @@ In coreutils this algorithm was slightly modified to work on more
 general input such as textual strings and file names
 (see @ref{Differences from the official Debian Algorithm}).
 
-In other contextes, such as other programs and other programming
-languages, a similar sorting funtionality is called
+In other contexts, such as other programs and other programming
+languages, a similar sorting functionality is called
 @uref{https://en.wikipedia.org/wiki/Natural_sort_order,natural sort}.
 
 
@@ -125,34 +125,29 @@ Currently there is no standard for version/natural sort ordering.
 
 That is: there is no one correct way or universally agreed-upon way to
 order items. Each program and each programming language can decide its
-own ordering algorithm and call it ’natural sort’ (or other various
+own ordering algorithm and call it 'natural sort' (or other various
 names).
 
-Therefore there is no point in complaining about incorrect sorting
-order or unexpected results: Coreutils’ version sort order is not
-incorrect, it might just differ from other similarly named
-implementation, or differ from personal expectations.
-
 See @ref{Other version/natural sort implementations} for many examples of
 differing sorting possibilities, each with its own rules and variations.
 
-If you do suspect a bug in coreutils’ implementation of version-sort,
+If you do suspect a bug in coreutils' implementation of version-sort,
 see @ref{Reporting bugs or incorrect results} on how to report them.
 
 
 @node Implementation Details
 @section Implementation Details
 
-GNU Coreutils’ version sort algorithm is based on
+GNU coreutils' version sort algorithm is based on
 @uref{https://www.debian.org/doc/debian-policy/ch-controlfields.html#version,
-Debian’s versioning scheme}, specifically on the "upstream version"
+Debian's versioning scheme}, specifically on the "upstream version"
 part.
 
 This section describe the ordering rules.
 
 The next section (@ref{Differences from the official Debian
 Algorithm}) describes some differences between GNU coreutils
-implementation and Debian’s official algorithm.
+implementation and Debian's official algorithm.
 
 
 @node Version-sort ordering rules
@@ -287,7 +282,7 @@ $ sort -n input4                  $ sort -V input4
 
 Numeric sort (@samp{sort -n}) treats the entire string as a single numeric
 value, and compares it to other values. For example, @code{8.1}, @code{8.10} and
-@code{8.100} are numerically equivalent, and are ordered together. Simiarly,
+@code{8.100} are numerically equivalent, and are ordered together. Similarly,
 @code{8.49} is numerically smaller than @code{8.5}, and appears before first.
 
 Version sort (@samp{sort -v}) first breaks down the string into digits and
@@ -301,7 +296,7 @@ remaining digits are compared numerically (@code{1} and @code{01}) -
 which are numerically equivalent. Hence, @code{8.01} and @code{8.1}
 are grouped together.
 
-Simiarly, comparing @code{8.5} to @code{8.49} - the @samp{@code{8}}
+Similarly, comparing @code{8.5} to @code{8.49} - the @samp{@code{8}}
 and @samp{@code{.}} parts are identical, then the numeric values @code{5} and
 @code{49} are compared. The resulting @code{5} appears before @code{49}.
 
@@ -354,7 +349,7 @@ $ touch   1.0.5_src.tar.gz     1.0%zzzzz.gz
 
 The same reasoning applies to the following example: The character
 @samp{@code{.}}  has ASCII value 46, and is smaller than slash
-characeter @samp{@code{/}} ASCII value 47:
+character @samp{@code{/}} ASCII value 47:
 
 @example
 $ cat input5
@@ -431,7 +426,7 @@ and is listed first in the sorted output.
 
 The remaining lines (@code{1}, @code{1%}, @code{1.2}, @code{1~})
 follow similar logic: The digit part is extracted (1 for all strings)
-and compares identical. The following extracted parts for the remainig
+and compares identical. The following extracted parts for the remaining
 input lines are: empty part, @code{%}, @code{.}, @code{~}.
 
 Tilde sorts before all others, hence the line @code{1~} appears next.
@@ -475,14 +470,14 @@ value 37 is smaller, hence @samp{@code{a%}} is listed before @samp{@code{aα}}.
 @section Differences from the official Debian Algorithm
 
 The GNU coreutils' version sort algorithm differs slightly from the
-official Debian algorith, in order to accomodate more general usage
+official Debian algorithm, in order to accommodate more general usage
 and file name listing.
 
 
 @node Minus/Hyphen @samp{-} and Colons @samp{:} characters
 @subsection Minus/Hyphen @samp{-} and Colons @samp{:} characters
 
-In Debian’s version string syntax the version consists of three parts:
+In Debian's version string syntax the version consists of three parts:
 @code{[epoch:]upstream_version[-debian_revision]} (@code{epoch} and
 @code{debian_revision} are optional).
 
@@ -504,7 +499,7 @@ If epoch is not present, colons @samp{:} are not allowed.
 If these parts are present, hyphen and/or colons can appear only onces
 in valid Debian version strings.
 
-In GNU Coreutils such restrictions are not reasonable (a filename can
+In GNU coreutils such restrictions are not reasonable (a file name can
 have many hyphens, a line of text can have many colons).
 
 As a result, in GNU coreutils hyphens and colons are treated exactly
@@ -530,10 +525,10 @@ With Debian's @command{dpkg} they will be listed as @code{ab-cd} first and
 
 For further technical details see @uref{https://bugs.gnu.org/35939,bug35939}.
 
-@node Additional hard-coded priorities In GNU coreutils’ version sort
-@subsection Additional hard-coded priorities In GNU coreutils’ version sort
+@node Additional hard-coded priorities In GNU coreutils' version sort
+@subsection Additional hard-coded priorities In GNU coreutils' version sort
 
-In GNU coreutils’ version sort algorithm, he following items have
+In GNU coreutils' version sort algorithm, the following items have
 special priority and sort earlier than all other characters (listed in
 order);
 
@@ -574,7 +569,7 @@ the ordering rules are the same.
 @node Special handling of file extensions
 @subsection Special handling of file extensions
 
-GNU coreutils’ version sort algorithm implements specialized handling
+GNU coreutils' version sort algorithm implements specialized handling
 of file extensions (or strings that look like file names with
 extensions).
 
@@ -591,7 +586,7 @@ letter or tilde, followed by one or more letters, digits, or tildes
 @code{(\.[A-Za-z~][A-Za-z0-9~]*)*}).
 
 @item
-If the strings contains suffixes, the sufffixes are temporarily
+If the strings contains suffixes, the suffixes are temporarily
 removed, and the strings are compared without them (using the
 @ref{Version-sort ordering rules,algorithm,algorithm} above).
 
@@ -600,7 +595,7 @@ If the suffix-less strings are identical, the suffix is restored and
 the entire strings are compared.
 
 @item
-If the suffix-les strings differ, the result is returned and the
+If the non-suffixed strings differ, the result is returned and the
 suffix is effectively ignored.
 @end enumerate
 
@@ -704,12 +699,12 @@ being first.
 
 A real-world example would be listing files such as:
 @file{gcc_10.fc9.tar.gz}
-and @file{gcc_10.8.12.7rc2.fc9.tar.bz2}: Debian’s algorithm would list
+and @file{gcc_10.8.12.7rc2.fc9.tar.bz2}: Debian's algorithm would list
 @file{gcc_10.8.12.7rc2.fc9.tar.bz2 first}, while @samp{ls -v} will list
 @file{gcc_10.fc9.tar.gz} first.
 
 These priorities make sense for @samp{ls -v}:
-versioned files will be listed in a more natural order.
+Versioned files will be listed in a more natural order.
 
 For @samp{sort -V} these priorities might seem arbitrary. However,
 because the sorting code is shared between the ls and sort program,
@@ -774,7 +769,7 @@ ab-cd                               abb
 abb                                 ab-cd
 @end example
 
-To illustrate the differnt handling of file extension: (see @ref{Special
+To illustrate the different handling of file extension: (see @ref{Special
 handling of file extensions}):
 
 @example
@@ -797,7 +792,7 @@ output of @samp{ls -v} or @samp{sort -V}), please first check the following:
 
 @enumerate
 @item
-Is the result consistant with Debian’s own ordering (using @command{dpkg}, see
+Is the result consistent with Debian's own ordering (using @command{dpkg}, see
 @ref{Comparing two strings using Debian's algorithm}) ? If it is, then this
 is not a bug - please do not report it.
 
@@ -809,7 +804,7 @@ then this is not a bug - please do not report it.
 @item
 If you have a question about specific ordering which is not explained
 here, please write to coreutils@@gnu.org, and provide concrete yet
-concise example that will helps us help you.
+concise example that will help us help you.
 
 @item
 If you still suspect a bug which is not explained by the above, please
@@ -869,7 +864,7 @@ function to compare two directory entries (despite the names, they are
 not identical to GNU coreutils' version sort ordering).
 
 @item
-Using Debian’s sorting algorithm in:
+Using Debian's sorting algorithm in:
 
 @itemize
 @item
@@ -901,7 +896,7 @@ Debian's code which performs the @code{upstream_version} comparison:
 version.c}.
 
 @item
-GNULIB code (used by GNU Coreutils) which performs the version comparison:
+GNULIB code (used by GNU coreutils) which performs the version comparison:
 @uref{https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/filevercmp.c,
 filevercmp.c}.
 @end itemize
-- 
2.9.3

Reply via email to