Hello,

On 2011-10-19 21:03, Felipe Monteiro de Carvalho wrote:
On Wed, Oct 19, 2011 at 6:33 PM, Martin Schreiber<[email protected]>  wrote:
Does it use locale specific collation in PasUnicodeCompareStr and
PasUnicodeCompareText?
Good point, no, not yet. But this affects only turkish, azeri and
lithuanian AFAIK

Adding turkish and azeri is trivial, because UTF8LowerCase supports
them, but I did not understand yet the rules for Lithuanian, they are
quite convoluted, depend on nearby chars and stuff like that.
I am native Lithuanian so I think can help at least providing info, but I must understand what is the problem first. Do I understand correctly, that "collation" means "sorting order"? In that case Lithuanian does not depend on near by characters.
There are 32 letters and they follow this order:
Aa < Ąą < Bb < Cc < Čč < Dd < Ee < Ęę < Ėė < Ff < Gg < Hh < Ii < Įį < Yy < Jj < Kk < Ll < Mm < Nn < Oo < Pp < Rr < Ss < Šš < Tt < Uu < Ųų < Ūū < Vv < Zz < Žž

And there are some accented characters which are used only in linguistic texts (for example, dictionaries). (All list is here: http://developer.mimer.com/charts/lithuanian.htm)

The funny thing is that in dictionaries when "sorting" words, "Aa" and "Ąą" (also: "Ee" and "Ęę" and "Ėė"; "Ii" and "Įį" and "Yy"; "Uu" and "Ųų" and "Ūū") are treated as the "same letter". BUT, for example words "šieną" <> "sieną" <> "sieną" - all three are different words (no accents in these characters). BUT I believe that accented characters should be treated as the same letter: "šiẽną" = "šieną"; "siena" = "síena", because it is the same word (accents do not change word meaning and are totally not required to be provided by the text writer).

I don't know if I managed to explain anything, but if you'll need some help with Lithuanian language - feel free to contact me.


Regards,
Žilvinas Ledas

--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to