Rick McGowan posted and was answered by John Hudson:

If there isn't a visual difference here, how could there be a lexical
difference? Imagine the age before computers. All you have to go on is
what's on the page. There isn't an inherent order in those elements; they
could have been written by the scribe in any order. If they appear the
same, you can't assign different meanings -- except by some extra-syllabic
informational context... right?

On the page, you would know -- or hopefully know -- from context. But a search engine or a sorting algorithm looking at the characters presumably needs to know the difference without additional context, hence the character ordering is important.

I think such distinctions are more than one should expect from a standard search engine or from simple sortation.


To move to French, for example, I would not expect to be able to tell whether the abbreviation "M." in "M. Bouteillier" stands for "Monsieur" or a name like "Marcel".

How do you know except from context whether "med." stands for "medical" or "medieval"?

In a company name such as "Perrault & Lavigne" should "&" sort according to default Unicode or as "and" or as "et"?

Should it be found from searches on "and", "et", "und" and so forth?

This is the business of application protocol and application utilities.

Indication of proper expansion of abbreviations for sorting and searching seems to me to be beyond what Unicode tries to do and what it can do reasonably.

If lexical forms in any language have variant meanings, then they are not for Unicode to distinguish except occasionally when Unicode provides identical glyphs that represent characters with very different properties such as "!" for punctuation and "!" for a Zulu click in the hope, probably vain, that people in general will recognize the difference.


Jim Allan

















Reply via email to