Re: [Harbour] CodePage different behaviour between 2.0.0beta3 and2.0.0 (Win)

Vitomir Cvitanovic Thu, 31 Dec 2009 01:59:47 -0800

Hi Przemek,

I'll update Slovenian and Croatian CPs in Harbour repository but I have


Thanks  :))

Neither Slovenian nor Croatian collation define the relations to X and Y
characters. OK but I do not believe that you never have to sort words
having them so what is the common practice in such situations? I.e. how
words starting with X or Y are sorted in printed vocabularies? Or maybe

Perhaps, I missunderstod your question, but it seems that both you and meclarified that:


Alfa
Beta
....
Vito
Work
Xeon
Yes
Zambia
Žaba  (that's why had "žabeceda")

The second part is addressed to Vito and Croatian users.
You have three digraph letters: Dž, Lj and Nj.
As I can see it created few problems. These digraphs should be sorted
as single letter in precisely defined order which is not compatible
with simple single character sorting. Dž is sorted as expected between
Dz and Đ but Lj and Nj no. Lj should be sorted between Lz and M and
Nj between Nz and O. Current Harbour code allows to define such collation
but I do not know if it's important to introduce it. For sure it's
not compatible with Clipper so if we add it then we have to keep also
CP which is strictly ntxcor.obj compatible. Maybe even you will want to
duplicate all existing Croatian CPs because though in fact current ones
without native support for digraphs are the same as Slovenian CPs with
only different names.
Do you think it's important to add support for Croatian (and maybe Latin
Serbian) collation respecting special order of digraphs?

I must say that this is (again) not only Croatian problem. Same is in allex-YU countries."Letters" DŽ, LJ, NJ are somehow special. But in Cliper we didn't havesupport for two-byte diacritics, so if we treat them as single leters itwould be ok, and we would be Clipper compatible.

If we start with full implementation this could bring some more problems.For instance DŽ - we could find it as "DŽ" (uppercase), "dž" (lowercase),and "Dž" (mixed case - for instance - Name, beginning of sentence...). Ingeneral I would vote for Clipper compatibility and treating this special"letters" as two letters.


Off the record:

In my Win apps where database is MS SQL Server, my settings for database isallways "Croatian_CI_AS" (Case-Insensitive, Accent-Sensitive) so forinstance "NJxxxx" is positioned between "Nxxxxx" and "Mxxxxxxx". Otherpossible combinations would be Croatian_CS_AS, Croatian_CI_AI,Croatian_CS_AI (total of FOUR combination :( )

I also found the information that these digraphs were not well chosen
because sometimes such character combination should be used as separate
letters and real digraphs have own Unicode values.
Is it true?

I Think that this is not true. NJ, DŽ, LJ allways have same meaning. Perhapsmistake comes from possible combination (DŽ/dž/Dž, NJ/nj/Nj, LJ/lj/Lj) aswriten above.

If yes can you precisely tell me what are these Unicode values for upper
and lower letters? Do you use them in real life? Are they supported by some
CPs and/or do you have hardware support (keyboards?) for using them?

No, our keyboards (I can send you a picture :) ) are allmost standard QWERTZwhere our simbols are near <return>:

QWERTZUIOPŠĐ
ASDFGHJKLČĆŽ
<YXCVBNM,.-

Is it something what have to be resolved in the future or rather you will
try to adopt existing solutions - i.e. in Poland our own national keyboard
layout is dieing and now most of us prefer standard QWERTY layout with
ALT-GR used to insert Polish national characters (we call it 'Polish
programmer keyboard').

Now, perhaps you know why I like CROSCII so much. When I start withcomputers only CP was US-ASCII. So today I allways have two keyboard layoutsinstalled (HR and US). When programing I don't have to use ALT-GRcombination for [, ], {, } because they are on standard places :). All Ihave to do is switch to 'EN' and I have 'standard' US-ASCII :)

The answers should help me also in the future when I work on Unicode
support in Harbour.

From my point of view Unicode is more then great thing but it lead to some

disadvantages:

MS in their SQL Server have limits for char and varchar DataTypes. It is8000 characters. Of course if you decide to use nchar or nvarchar (unicode)this limtis is 4000 :(They are dealing with bytes - not leters (characters), so I don't useunicode data types. If it is necessery, I would use different colatin fortable that needs it.


best regards,
Vito

P.S. Soory for my language, but EN is not my native language :))

_______________________________________________
Harbour mailing list (attachment size limit: 40KB)
[email protected]
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] CodePage different behaviour between 2.0.0beta3 and2.0.0 (Win)

Reply via email to