Package: ispell
Version: 3.1.20.0-4.1
Severity: important
Tags: patch

In NMU version 3.1.20.0-4.1 the behavior of munchlist heavily changed
because the applied patch did the following change to all "sort"
calls:
 sort +a -b
was converted to
 sort -ka',b'
where a'=a+1 and b'=b+1.

According to an old (woody) man page of sort(1) this is not fully
correct:

       +POS1 [-POS2]
              start a key at POS1, end it *before* POS2 (obsolesĀ­
              cent)  field numbers and character offsets are numĀ­
              bered starting with  zero  (contrast  with  the  -k
              option)

       -k POS1[,POS2]
              start a key at POS1, end it *at* POS2 field numbers
              and  character  offsets  are numbered starting with
              one (contrast with zero-based +POS form)

So the correct (which means identical to old behavior of munchlist)
new call should be
 sort -ka',b

As a result of the behavior change of 3.1.20.0-4.1 the size of the
German ispell dictionary growed by nearly 200%:

Input:                77802 lines
3.1.20.0-4 Output:    81077 lines
3.1.20.0-4.1 Output: 201682 lines 

So as you can see, something is wrong here.

The attached patch (against munchlist in 3.1.20.0-4.1) changes the
behavior of munchlist back to the one in 3.1.20.0-4 but still using
the new sort(1) syntax.

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable'), (50, 'unstable')
Architecture: amd64 (x86_64)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.14.3ro2
Locale: LANG=de_DE, LC_CTYPE=de_DE (charmap=ISO-8859-1)

Versions of packages ispell depends on:
ii  dictionaries-common         0.65.0       Common utilities for spelling dict
ii  iamerican [ispell-dictionar 3.1.20.0-4.1 An American English dictionary for
ii  ibritish [ispell-dictionary 3.1.20.0-4.1 A British English dictionary for i
ii  ingerman [ispell-dictionary 20030222-8   New German orthography dictionary 
ii  iogerman [ispell-dictionary 2-23         Old German orthography dictionary 
ii  libc6                       2.3.5-8.1    GNU C Library: Shared libraries an
ii  libncurses5                 5.5-1        Shared libraries for terminal hand

Versions of packages ispell recommends:
ii  miscfiles [wordlist]          1.4.2-3    Dictionaries and other interesting
ii  wamerican [wordlist]          6-2        American English dictionary words 
ii  wngerman [wordlist]           20030222-8 New German orthography wordlist

-- no debconf information

Tschoeeee

        Roland

-- 
 * [EMAIL PROTECTED] * http://www.spinnaker.de/ *
--- new/munchlist       2006-02-11 15:31:07.000000000 +0100
+++ ro/munchlist        2006-02-11 15:42:45.000000000 +0100
@@ -357,7 +357,7 @@
     cat "$@" | ispell "$wchars" -e1 -d $FAKEHASH -p /dev/null | tr " " '
 '
 fi \
-  | sort $SORTTMP -u -k1f,2 -k1 \
+  | sort $SORTTMP -u -k1f,1 -k1 \
   | $COMBINE $icflags $langtabs \
   | sort $SORTTMP -u > $EXPANDEDINPUT
 #
@@ -417,7 +417,7 @@
 ispell "$wchars" -c -W0 -d $FAKEHASH -p /dev/null < $STRIPPEDINPUT \
   | tr " " '
 ' \
-  | egrep "$flagmarker" | sort $SORTTMP -u "-t$flagmarker" -k1,2 -k2 \
+  | egrep "$flagmarker" | sort $SORTTMP -u "-t$flagmarker" -k1,1 -k2 \
   | $JOIN $SIGNED "-t$flagmarker" - $EXPANDEDINPUT > $CRUNCHEDINPUT
 #
 # We now have a list of legal roots, and of affixes that apply to the
@@ -471,7 +471,7 @@
   | (sed -e 's; .*$;;' ; /bin/rm -f $JOINEDPAIRS $EXPANDEDPAIRS) \
   | uniq \
   | (comm -13 - $CRUNCHEDINPUT ; /bin/rm -f $CRUNCHEDINPUT) \
-  | sort $SORTTMP -u "-t$flagmarker" -k1f,2 -k1 \
+  | sort $SORTTMP -u "-t$flagmarker" -k1f,1 -k1 \
   | $COMBINE $langtabs > $LEGALFLAGLIST
 
 #
@@ -501,7 +501,7 @@
   | sort $SORTTMP \
   | uniq -c \
   | tr '       ' ' ' \
-  | sort $SORTTMP -k1rn,2 -k3 > $PRODUCTLIST
+  | sort $SORTTMP -k1rn,1 -k3 > $PRODUCTLIST
 
 if [ `egrep ' p ' $PRODUCTLIST | wc -l` -gt 0 \
   -a `egrep ' s ' $PRODUCTLIST | wc -l` -gt 0 ]
@@ -686,7 +686,7 @@
              D
              }' \
          | comm -23 - $ILLEGALCOMBOS \
-         | sort $SORTTMP -u "-t$flagmarker" -k1f,2 -k1 \
+         | sort $SORTTMP -u "-t$flagmarker" -k1f,1 -k1 \
          | $COMBINE $langtabs > $CROSSROOTS
        mv $CROSSROOTS $LEGALFLAGLIST
        if [ "$debug" = yes ]
@@ -722,10 +722,10 @@
 #
 $verbose  &&  echo 'Eliminating non-optimal affixes.' 1>&2
 ispell "$wchars" -e4 -d $FAKEHASH -p /dev/null < $LEGALFLAGLIST \
-  | sort $SORTTMP -k2,3 -k3rn,4 -k1,2 \
-  | sort $SORTTMP -um -k2,3 \
+  | sort $SORTTMP -k2,2 -k3rn,3 -k1,1 \
+  | sort $SORTTMP -um -k2,2 \
   | sed -e 's; .*$;;' \
-  | sort $SORTTMP -u "-t$flagmarker" -k1f,2 -k1 > $MINIMALAFFIXES
+  | sort $SORTTMP -u "-t$flagmarker" -k1f,1 -k1 > $MINIMALAFFIXES
 /bin/rm -f $LEGALFLAGLIST
 #
 # Now we're almost done.  MINIMALAFFIXES covers some (with luck, most)
@@ -756,7 +756,7 @@
       | sort $SORTTMP "-t$flagmarker" -u -k1f,1 -k1
 else
     # MINIMALAFFIXES is empty;  just produce a sorted version of STRIPPEDINPUT
-    sort $SORTTMP "-t$flagmarker" -u -k1f,2 -k1 $STRIPPEDINPUT
+    sort $SORTTMP "-t$flagmarker" -u -k1f,1 -k1 $STRIPPEDINPUT
 fi
 /bin/rm -rf ${TDIR}
 if [ "X$MUNCHMAIL" != X ]

Reply via email to