https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=36947
David Cook <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|Needs Signoff |Failed QA --- Comment #4 from David Cook <[email protected]> --- (In reply to David Cook from comment #3) > I agree about Unicode::Normalize being the way to go for the diacritics... > > As for the normalization form... a quick Google suggest that NFKD is most > likely the correct normalization form to use, although it might only help in > terms of the initial sorting based on the first letter. For instance, I > think ÅB should become A + a combining ring above (030A bytes) + B. Indeed, I think that my assessment is (mostly) correct. Consider the following code and its output: #!/usr/bin/perl use utf8; use Modern::Perl; use Unicode::Normalize; binmode(STDOUT, ":encoding(UTF-8)"); my @stuff = ( "ab", "Aa", "a_", "bone", "Bad", "Åa", ); my @sorted = sort { NFKD(uc($a)) cmp NFKD(uc($b)) } @stuff; use Data::Dumper; warn Dumper(\@sorted); foreach my $thing (@sorted){ print NFKD($thing) . "\n"; } perl testunicode.pl | xxd $VAR1 = [ 'Aa', 'ab', 'a_', "\x{c5}a", 'Bad', 'bone' ]; 00000000: 4161 0a61 620a 615f 0a41 cc8a 610a 4261 Aa.ab.a_.A..a.Ba 00000010: 640a 626f 6e65 0a d.bone. -- You can see that it hasn't properly sorted the array. The NFKD broke the Å character (bytes c385) into bytes 41cc8a which is A + "Unicode Character 'COMBINING RING ABOVE' (U+030A)" which has UTF8 encoded bytes of CC8A. While it's sorted the As all together using the uppercasing, it then sorts by other marks. We can see that the "combining ring above" sorts below the underscore punctuation. If we're going to do a proper comparison, I think that we're going to need to completely remove the diacritics. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
