Package: ispell
Version: 3.1.20.0-4.1
Severity: important
Tags: patch
In NMU version 3.1.20.0-4.1 the behavior of munchlist heavily changed
because the applied patch did the following change to all "sort"
calls:
sort +a -b
was converted to
sort -ka',b'
where a'=a+1 and b'=b+1.
According to an old (woody) man page of sort(1) this is not fully
correct:
+POS1 [-POS2]
start a key at POS1, end it *before* POS2 (obsolesĀ
cent) field numbers and character offsets are numĀ
bered starting with zero (contrast with the -k
option)
-k POS1[,POS2]
start a key at POS1, end it *at* POS2 field numbers
and character offsets are numbered starting with
one (contrast with zero-based +POS form)
So the correct (which means identical to old behavior of munchlist)
new call should be
sort -ka',b
As a result of the behavior change of 3.1.20.0-4.1 the size of the
German ispell dictionary growed by nearly 200%:
Input: 77802 lines
3.1.20.0-4 Output: 81077 lines
3.1.20.0-4.1 Output: 201682 lines
So as you can see, something is wrong here.
The attached patch (against munchlist in 3.1.20.0-4.1) changes the
behavior of munchlist back to the one in 3.1.20.0-4 but still using
the new sort(1) syntax.
-- System Information:
Debian Release: testing/unstable
APT prefers testing
APT policy: (500, 'testing'), (500, 'stable'), (50, 'unstable')
Architecture: amd64 (x86_64)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.14.3ro2
Locale: LANG=de_DE, LC_CTYPE=de_DE (charmap=ISO-8859-1)
Versions of packages ispell depends on:
ii dictionaries-common 0.65.0 Common utilities for spelling dict
ii iamerican [ispell-dictionar 3.1.20.0-4.1 An American English dictionary for
ii ibritish [ispell-dictionary 3.1.20.0-4.1 A British English dictionary for i
ii ingerman [ispell-dictionary 20030222-8 New German orthography dictionary
ii iogerman [ispell-dictionary 2-23 Old German orthography dictionary
ii libc6 2.3.5-8.1 GNU C Library: Shared libraries an
ii libncurses5 5.5-1 Shared libraries for terminal hand
Versions of packages ispell recommends:
ii miscfiles [wordlist] 1.4.2-3 Dictionaries and other interesting
ii wamerican [wordlist] 6-2 American English dictionary words
ii wngerman [wordlist] 20030222-8 New German orthography wordlist
-- no debconf information
Tschoeeee
Roland
--
* [EMAIL PROTECTED] * http://www.spinnaker.de/ *
--- new/munchlist 2006-02-11 15:31:07.000000000 +0100
+++ ro/munchlist 2006-02-11 15:42:45.000000000 +0100
@@ -357,7 +357,7 @@
cat "$@" | ispell "$wchars" -e1 -d $FAKEHASH -p /dev/null | tr " " '
'
fi \
- | sort $SORTTMP -u -k1f,2 -k1 \
+ | sort $SORTTMP -u -k1f,1 -k1 \
| $COMBINE $icflags $langtabs \
| sort $SORTTMP -u > $EXPANDEDINPUT
#
@@ -417,7 +417,7 @@
ispell "$wchars" -c -W0 -d $FAKEHASH -p /dev/null < $STRIPPEDINPUT \
| tr " " '
' \
- | egrep "$flagmarker" | sort $SORTTMP -u "-t$flagmarker" -k1,2 -k2 \
+ | egrep "$flagmarker" | sort $SORTTMP -u "-t$flagmarker" -k1,1 -k2 \
| $JOIN $SIGNED "-t$flagmarker" - $EXPANDEDINPUT > $CRUNCHEDINPUT
#
# We now have a list of legal roots, and of affixes that apply to the
@@ -471,7 +471,7 @@
| (sed -e 's; .*$;;' ; /bin/rm -f $JOINEDPAIRS $EXPANDEDPAIRS) \
| uniq \
| (comm -13 - $CRUNCHEDINPUT ; /bin/rm -f $CRUNCHEDINPUT) \
- | sort $SORTTMP -u "-t$flagmarker" -k1f,2 -k1 \
+ | sort $SORTTMP -u "-t$flagmarker" -k1f,1 -k1 \
| $COMBINE $langtabs > $LEGALFLAGLIST
#
@@ -501,7 +501,7 @@
| sort $SORTTMP \
| uniq -c \
| tr ' ' ' ' \
- | sort $SORTTMP -k1rn,2 -k3 > $PRODUCTLIST
+ | sort $SORTTMP -k1rn,1 -k3 > $PRODUCTLIST
if [ `egrep ' p ' $PRODUCTLIST | wc -l` -gt 0 \
-a `egrep ' s ' $PRODUCTLIST | wc -l` -gt 0 ]
@@ -686,7 +686,7 @@
D
}' \
| comm -23 - $ILLEGALCOMBOS \
- | sort $SORTTMP -u "-t$flagmarker" -k1f,2 -k1 \
+ | sort $SORTTMP -u "-t$flagmarker" -k1f,1 -k1 \
| $COMBINE $langtabs > $CROSSROOTS
mv $CROSSROOTS $LEGALFLAGLIST
if [ "$debug" = yes ]
@@ -722,10 +722,10 @@
#
$verbose && echo 'Eliminating non-optimal affixes.' 1>&2
ispell "$wchars" -e4 -d $FAKEHASH -p /dev/null < $LEGALFLAGLIST \
- | sort $SORTTMP -k2,3 -k3rn,4 -k1,2 \
- | sort $SORTTMP -um -k2,3 \
+ | sort $SORTTMP -k2,2 -k3rn,3 -k1,1 \
+ | sort $SORTTMP -um -k2,2 \
| sed -e 's; .*$;;' \
- | sort $SORTTMP -u "-t$flagmarker" -k1f,2 -k1 > $MINIMALAFFIXES
+ | sort $SORTTMP -u "-t$flagmarker" -k1f,1 -k1 > $MINIMALAFFIXES
/bin/rm -f $LEGALFLAGLIST
#
# Now we're almost done. MINIMALAFFIXES covers some (with luck, most)
@@ -756,7 +756,7 @@
| sort $SORTTMP "-t$flagmarker" -u -k1f,1 -k1
else
# MINIMALAFFIXES is empty; just produce a sorted version of STRIPPEDINPUT
- sort $SORTTMP "-t$flagmarker" -u -k1f,2 -k1 $STRIPPEDINPUT
+ sort $SORTTMP "-t$flagmarker" -u -k1f,1 -k1 $STRIPPEDINPUT
fi
/bin/rm -rf ${TDIR}
if [ "X$MUNCHMAIL" != X ]