On 2021-04-15 18:44, Erik Auerswald wrote:
Hi,

On Thu, Apr 15, 2021 at 11:47:34PM +0200, Vincent Lefevre wrote:
I'm currently using version-sort in order to get integers sorted
in strings (due to the lack of simple numeric sort like in zsh),
but I've noticed some ugliness. This may be bugs, not I'm not sure
[ ... ]
I think all of your problems ("ugliness") is caused by the concept of "file
extensions" in GNU Coreutils version sort.

https://www.gnu.org/software/coreutils/manual/coreutils.html#Special-handling-of-file-extensions

That strikes me as a very poor set of requirements. The treatment
of suffixes is extremely hacky, and unnecessary.

Here is an algorithm + implementation I hacked up in 15 minutes. Here is
the informal spec. Note that it makes no mention of special case hacks
for suffixes, yet suffixes end up treated reasonably:

1. A string is parsed into tokens. There are three kinds of tokens:
   - DOT: (".")
   - INT: decimal string (e.g. "123")
   - TXT: sequence of other characters

2. INT tokens are converted to integer values.

3. The token sequence is parsed in order to shore up
   INT DOT INT { DOT INT }* ... sequences into (INT INT ...) lists.

4. Any other INT token not placed into a list is turned into the
   a list of one integer (INT)

Then, the resulting sequence is compared as follows:

- TXT-TXT comparisons are ordinary lexicographic

- LIST-LIST comparisons are lexicographic on the list of integers

- Otherwise, the sorting order is DOT < TXT < LIST

Sample implementation in TXR Lisp.  Note: to achieve DOT < TXT,
we replace "." tokens with the character object #\.
The TXR Lisp less function then takes care of it:

 (less #\a "a" '(1 2 3)) -> t


Run:

$ txr versort.tl
abc.txt
abc-1d.2c.tar.gz
abc-1.2.tar.gz
abc-1.2c.tar.gz
abc-1.2.3.tar.gz
abc-1.2.3-3.14.tar.gz
abc-1.2.3-4.5.tar.gz
abc-1.2.3-9.tar.gz
abc-1.2.3-9.tgz
abc-1.2.3-9-sig.bin
abc-1.2.3.3.14.tar.gz
abc-2-tar.gz
abc-11-tar.gz
foo.txt
zzz-3.0
zzz-4.0
zzz-xyz-4.5
zzz-xyz-9.15.3

Code in versort.tl

abc-1d.2c.tar.gz is before abc-1.2 because d is not part of the version number.
This is a case of version 1 coming before 1.2.

(Don't have trailing junk in your version numbers, except possibly at the very
end; keep them numeric!)


(defun ver-tok (str)
  (tok #/\.|\d+|[^\d.]+/ str))

(defun ver-parse (str)
  (let ((all-toks (ver-tok str)))
    (labels ((convert (toks)
               (mapcar [iffi (fr^ #/[0-9]/) toint] toks))
             (parse (:match)
               (((@(integerp @a) "." @(integerp @b) . @rest))
                 (parse (cons (list a b) rest)))
               (((@(integerp @a) . @rest))
                 (parse (cons (list a) rest)))
               (((@(listp @a) "." @(integerp @b) . @rest))
                 (parse (cons (append (flatten a) (list b)) rest)))
               ((("." . @rest)) (cons #\. (parse rest)))
               (((@a . @rest)) (cons a (parse rest)))
               ((@else) else)))
      (parse (convert all-toks)))))

(defun ver-recombine (vsyntax)
  (cat-str (mapcar [iffi consp
                        [chain (op mapcar tostring)
                               (ap join-with ".")]]
                   vsyntax)))

(defun ver-sort (strings)
  [mapcar ver-recombine (sort [mapcar ver-parse strings])])

(let ((data '("abc-1.2.3.tar.gz"
              "zzz-4.0"
              "abc-11-tar.gz"
              "abc-2-tar.gz"
              "abc-1d.2c.tar.gz"
              "abc-1.2c.tar.gz"
              "abc-1.2.3-9-sig.bin"
              "abc-1.2.tar.gz"
              "abc-1.2.3-9.tar.gz"
              "abc-1.2.3-3.14.tar.gz"
              "abc-1.2.3.3.14.tar.gz"
              "abc-1.2.3-9.tgz"
              "zzz-3.0"
              "foo.txt"
              "abc.txt"
              "abc-1.2.3-4.5.tar.gz"
              "zzz-xyz-9.15.3"
              "zzz-xyz-4.5")))
  (tprint (ver-sort data)))


Reply via email to