Bug#502356: marked as done (locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly)

Debian Bug Tracking System Wed, 15 Oct 2008 23:58:37 -0700

Your message dated Thu, 16 Oct 2008 08:49:28 +0200
with message-id <[EMAIL PROTECTED]>
and subject line Re: Bug#502356: locales: sv_SE locale sometimes fail to 
collate 'v' and 'w' correctly
has caused the Debian Bug report #502356,
regarding locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [EMAIL PROTECTED]
immediately.)


-- 
502356: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502356
Debian Bug Tracking System
Contact [EMAIL PROTECTED] with problems

--- Begin Message ---

Package: locales
Version: 2.3.6.ds1-13etch7
Severity: normal
Tags: l10n


I was doing a bit of C++ programming, and replacing my own swedish collation 
algorithm with
the standard locales (through the standard C++ std::locale interface), when my 
unit tests
started to fail. It turned out I could repeat it with the standard sort 
utility, so that's
what I'll use here.

This quote from /usr/share/i18n/locales/sv_SE describes what the locale intends 
to
implement, and it's also the rule I am familiar with from real life:

% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'.  Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.

And that seems to work *some* of the time ... out of the following three 
examples,
the two first are ok and show how it should work. The third is simply wrong --
"wword" and "vword" are identical except one contains the 'w' variant of the
letter 'v', and should thus collate last.

tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
word
vorm
tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
vord
word
tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
wword
vword
tuva:~> 

I have not done any further experiments to see what triggers it. I cannot help
suspecting that similar rules for other languages are affected as well ...

Final side note: Solaris 8 passes this test. That's the only other Unix I've 
tested.

regards,
Jorgen

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: powerpc (ppc)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-3-powerpc
Locale: LANG=sv_SE.utf8, LC_CTYPE=sv_SE.utf8 (charmap=UTF-8)

Versions of packages locales depends on:
ii  debconf [debconf-2.0]  1.5.11etch2       Debian configuration management sy
ii  libc6 [glibc-2.3.6.ds1 2.3.6.ds1-13etch7 GNU C Library: Shared libraries

locales recommends no packages.

-- debconf information:
  locales/default_environment_locale: en_US
  locales/locales_to_be_generated: en_US ISO-8859-1, sv_SE.UTF-8 UTF-8, sv_SE 
ISO-8859-1

--- End Message ---

--- Begin Message ---

Version: 2.7-1

Jorgen Grahn a écrit :
> Package: locales
> Version: 2.3.6.ds1-13etch7
> Severity: normal
> Tags: l10n
> 
> 
> I was doing a bit of C++ programming, and replacing my own swedish collation 
> algorithm with
> the standard locales (through the standard C++ std::locale interface), when 
> my unit tests
> started to fail. It turned out I could repeat it with the standard sort 
> utility, so that's
> what I'll use here.
> 
> This quote from /usr/share/i18n/locales/sv_SE describes what the locale 
> intends to
> implement, and it's also the rule I am familiar with from real life:
> 
> % The letter w is normally not present in the Swedish alphabet. It
> % exists in some names in Swedish and foreign words, but is accounted
> % for as a variant of 'v'.  Words and names with 'w' are in Swedish
> % ordered alphabetically among the words and names with 'v'. If two
> % words or names are only to be distinguished by 'v' or % 'w', 'v' is
> % placed before 'w'.
> 
> And that seems to work *some* of the time ... out of the following three 
> examples,
> the two first are ok and show how it should work. The third is simply wrong --
> "wword" and "vword" are identical except one contains the 'w' variant of the
> letter 'v', and should thus collate last.
> 
> tuva:~> /bin/echo -e "word\nvorm" | env LC_COLLATE=sv_SE.iso88591 sort
> word
> vorm
> tuva:~> /bin/echo -e "word\nvord" | env LC_COLLATE=sv_SE.iso88591 sort
> vord
> word
> tuva:~> /bin/echo -e "vword\nwword" | env LC_COLLATE=sv_SE.iso88591 sort
> wword
> vword
> tuva:~> 
> 
> I have not done any further experiments to see what triggers it. I cannot help
> suspecting that similar rules for other languages are affected as well ...
> 

The bug is fixed in glibc 2.7 and following. Closing the bug for those
versions.

-- 
  .''`.  Aurelien Jarno             | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   [EMAIL PROTECTED]         | [EMAIL PROTECTED]
   `-    people.debian.org/~aurel32 | www.aurel32.net

--- End Message ---

Bug#502356: marked as done (locales: sv_SE locale sometimes fail to collate 'v' and 'w' correctly)

Reply via email to