Hello,
On 2011-10-19 21:03, Felipe Monteiro de Carvalho wrote:
On Wed, Oct 19, 2011 at 6:33 PM, Martin Schreiber<[email protected]> wrote:
Does it use locale specific collation in PasUnicodeCompareStr and
PasUnicodeCompareText?
Good point, no, not yet. But this affects only turkish, azeri and
lithuanian AFAIK
Adding turkish and azeri is trivial, because UTF8LowerCase supports
them, but I did not understand yet the rules for Lithuanian, they are
quite convoluted, depend on nearby chars and stuff like that.
I am native Lithuanian so I think can help at least providing info, but
I must understand what is the problem first.
Do I understand correctly, that "collation" means "sorting order"? In
that case Lithuanian does not depend on near by characters.
There are 32 letters and they follow this order:
Aa < Ąą < Bb < Cc < Čč < Dd < Ee < Ęę < Ėė < Ff < Gg < Hh < Ii < Įį < Yy
< Jj < Kk < Ll < Mm < Nn < Oo < Pp < Rr < Ss < Šš < Tt < Uu < Ųų < Ūū <
Vv < Zz < Žž
And there are some accented characters which are used only in linguistic
texts (for example, dictionaries). (All list is here:
http://developer.mimer.com/charts/lithuanian.htm)
The funny thing is that in dictionaries when "sorting" words, "Aa" and
"Ąą" (also: "Ee" and "Ęę" and "Ėė"; "Ii" and "Įį" and "Yy"; "Uu" and
"Ųų" and "Ūū") are treated as the "same letter".
BUT, for example words "šieną" <> "sieną" <> "sieną" - all three are
different words (no accents in these characters).
BUT I believe that accented characters should be treated as the same
letter: "šiẽną" = "šieną"; "siena" = "síena", because it is the same
word (accents do not change word meaning and are totally not required to
be provided by the text writer).
I don't know if I managed to explain anything, but if you'll need some
help with Lithuanian language - feel free to contact me.
Regards,
Žilvinas Ledas
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus