Re: Converting a character to upper case in string

Patrick Schluter via Digitalmars-d-learn Sat, 22 Sep 2018 14:05:44 -0700

On Saturday, 22 September 2018 at 06:01:20 UTC, VladimirPanteleev wrote:

On Friday, 21 September 2018 at 12:15:52 UTC, NX wrote:
How can I properly convert a character, say, first one toupper case in a unicode correct manner?
That would depend on how you'd define correctness. If yourapplication needs to support "all" languages, then (dependinghow you interpret it) the task may not be meaningful, as somelanguages don't have the notion of "upper-case" or even"character" (as an individual glyph). Some languages do havethose notions, but they serve a specific purpose that doesn'talign with the one in English (e.g. Lojban).

There are other traps in the question of uppercase/lowercasewhich makes is indeed very difficult to handle correctly if wedon't define what correctly means.

Examples:

- It may be necessary to know the locale, i.e. the language ofthe string to uppercase. In Turkish uppercase of i is not I but İand lowercase of I is ı (that was a reason for the calamitous lowperformance of toUpper/toLower in Java for example.- Some uppercases depend on what they are used for. German ßshouldbe uppercased as SS (note also btw that 1 codepoint becomes2 in uppercase) in normal text, but for calligraphic work, roadsigns and other usages it can be capital ẞ.- Greek has 2 lowercase forms for Σ but two lowercase forms σ andς depending on the word position.- While it becomes less and less relevant Serbo-croatian may usedigraphs when transcoding the script from Cyrillic (Serbian) toLatin (Croatian), these digraphs have 2 uppercase forms(title-case and all capital):

  - ǆ -> Ǆ or ǅ
  - ǉ -> Ǉ or ǈ
  - Ǌ -> ǋ or ǌ
Normalization would normally take care of that case.

- Some languages may modify or remove diacritical signs whenuppercasing. It is quite usual in French to not put accents oncapitals.

It is also clear that the operation of uppercasing is notsymetric with lowercasing.

In which code level I should be working on? Grapheme? Or maybecode point is sufficient?
Using graphemes is necessary if you need to support e.g.combining marks (e.g. ̏◌ + S = ̏S).

Re: Converting a character to upper case in string

Reply via email to