Re: [HOT] Name tag in non-latin script - hindrance for NGOs/aid agencies?

Nasir Khan Thu, 28 Nov 2019 11:14:07 -0800

Hi,
I believe a little more information needed to be added here to point the
discussion to the right direction. The language usage a is not Latin script
and the Unicode block is completely ok. So there is no issue in writing
that language anywhere in internet and no additional special font is needed
as well to render properly. At it is true that Unicode contains all the
letter of all the languages, so if the font has the language specific
Unicode block it should be displayed properly.


So far from my experience i can say, that country map is not complete in
OSM. Being an open source product there is a trust and dependability issue
as well. More people are trying to use and showing interest here now a
days, because Google is expensive. What are the outlets of using OSM from
desktop and mobile? Those who are active and contributing for a longer than
me can easily list top most popular/ used apps or sites to use OSM. How
many of them supports complex non-latin unicode characters perfectly? I
found only a very few but there could be more.

So the map is incomplete with less data, there are also incorrect data and
we are forcing to move it in a place where it will become completely
useless. Because the softwares can not show the texts properly. Will it be
helpful for that language or for that country?

I am not against using the native language, rather i am contributing to
Wikipedia and a number of open source communities on the same language
version for more than 12 years. I am also involved in a number of language
specific national expert committees. But here i am giving my opinion not to
use the native language now atleast for the time being.

It is true that all the people of the country are comfortable and prefer
native language. Can you please provide me data what percentage of them are
using OSM website and how many of them have a navigation app which is based
on OSM? what are the use cases we are targeting to cover?

If we write `name` in English and add the native name in `name:xx` and also
add English in `name:en` for now and will it be impossible to move all the
`name:xx` to `name` when scenario improves? I believe it could be done via
automated scripts.

Regards
Nasir Khan

--
*Nasir Khan Saikat*
www.nasirkhn.com



On Fri, 29 Nov 2019 at 00:18, Philippe Verdy <verd...@wanadoo.fr> wrote:

> XML never started from scratch based on old versions of SGML or any
> updated version of SGML.
> When it was created, Unicode was already there and its support in XML was
> mandatary from the start, including the support for UTF-8 by default. And
> It was based on the earlier work on XHTML which already included Unicode
> support by default as well, from the current development of HTML4 which was
> also updated to enforce the behavior for Unicode (notably it was made clear
> that to be conforming, the numeric character references could only refer to
> the UCS codepoints, independently of the charset used for the document, and
> that all charsets had to have a mapping to the UCS.
>
> Now the issue is possibly elsewhere: when languages uses a script or
> orthography not based on Unicode because it is still not well supported or
> has problems.
> - there were problems for Korean in Unicode 1.0 before the merge with the
> ISO 10646, but Unicode 1.0 is dead since long and no software today are
> making any reference to Unicode 1.0;
> - there has been problems with the Unicode encoding for Burmese, and
> Mongolian, they are mostly solved, except Mongolian with works still
> pending for the behavior of some clusters and the best way to encode the
> vowels, this will soon change but yes in that case there are problems; but
> the change will not be from adopting or not Unicode, but in the best
> sequences of Unicode characters to use to represent these clusters: this is
> an orthographic change, not a change of encodings, but yes in that ase it
> measn changing Unicode fonts for other updated Unicode fonts; no hack based
> on legacy charsets are involded.
>
> Now there remains languages/scripts not encoded at all (not in Unicode and
> not even in any other charset): making a reference to a legacy ISO chartset
> is inapplicable as there's no such legacy charset. All that an be done for
> now in these languages is to use some transliteration (but not necessarily
> Latin): Uyghur for example is generally written in that case using Chinese
> sinograms (with some specific forms in rare cases), or Arabic (with some
> additional diacritics and forms, but if thee forms are not handled in
> fonts, at least there's a basic orthography that is readable, the same way
> that we can substitute some characters in Latin or remove some diacritics
> for African languages, or simply not encoding some ligatures by writing
> digrams instead: this is what happens already when these langauges are used
> in some international documents and forms like passports: there's a
> degraded orthography, but this is still readable and sufficiently
> distinctive for practical uses and isolated text fragemtsn are not the
> onily source of disambiguation as there are other contextual information,
> including photo and biometric data or unique identifiers, and a scanned
> handwritten signature, plus personal data, including address for
> identification purpose).
>
> Anyway, even if there's a prefered orthography, slight deviation of
> orthograhy is very common and frequently used in public displays or
> advertizing, and no one is confused. And the "prefered" orthography is just
> a matter of choice and is unstable across time, or even space when there
> are competing authorities providing their own local terminology for some
> local official uses, and not mandatory everywhere (and most languages also
> have lot of dialects that may use different orthography to render their own
> local phonology and accents: not everyone agree with these prefered form,
> even in the same location where dialects are also competing. and let's
> remember that all modern language continue to evolve and borrow a lot from
> other languages and new terms are creatively added. Finally there are
> orthographic reforms, but they take a considerable time to be adopted or
> never reah any acceptation and legacy orthographies remain visible in lot
> of places and publications (plus, people are much more mobile today and
> there are widespread communities located around the world that adapt
> constantly to their new context and on which the official reforms have no
> impact).
>
> So in conclusion, there's no other choice than Unicode today. Unicode is
> mandatory in XML, and in OSM. Don't spak about legacy charsets. But we are
> jsut concerned by support in fonts: ALL characters encoded up to Unicode
> 9.0 have suitable fonts immediately usable, and these fonts are all free
> for use, and based on TrueType/OpenType. All OSM rendering softwares should
> be able to use TrueType/Opentype fonts. The only remaining problem is the
> existence of mobile phones that don't have a lot of embedded fonts and
> support a more limited set. But none of them are using or need any legacy
> charsets.
>
>
> Le jeu. 28 nov. 2019 à 15:11, John Whelan <jwhelan0...@gmail.com> a
> écrit :
>
>> The way I would approach this professionally would be to define the
>> requirements first.
>>
>> In this case we have a requirement to display the name in the language of
>> choice.
>>
>> We also have a requirement to be compatible with existing software.
>>
>> Pragmatically I would recommend changing the name field to use only an 8
>> bit Latin alphabet character set recognizing that not all systems can
>> handle more complex character sets.  Which precise character set should be
>> chosen would a be subject for discussion but either ISO-8859-1 or 
>> Windows-1252
>> would be contenders.  My personal preference would be the ISO standard.
>>
>> Unicode is nice but we managed with 6 bit character sets for many years
>> when I started with computers.  Even accented characters were a major
>> problem.  Also remember that .OSM data is in XML format and XML came out of
>> SGML which was first used to transmit documents over modems so only 7 bits
>> where available for encoding characters.  The extended characters use a
>> special escape code sequence to hold the unicode characters.
>>
>> Realistically software never wears out but source code gets lost.
>> Compilers and operating systems get updated.  It may not be possible to
>> modify existing software to handle unicode characters.  I have a perfectly
>> good scanner sitting in the corner that no long can be used with Win 10
>> because of a new and improved driver.  With the OpenStreetMap environment
>> there isn't even a way to get a complete list of software that uses the
>> OpenStreetMap data so it can be tested.
>>
>> The local language can be added in a name:  then software that can handle
>> the local names can pick it up.  Osmand etc. can be configured to use the
>> local name transparently so the local population can use it in the language
>> of their choice.
>>
>> This approach would appear to meet the requirements.  The argument that
>> we should change all the existing software to meet a requirement that was
>> not clearly defined when the software was written doesn't make sense to me.
>>
>> Cheerio John
>>
>> Frederik Ramm wrote on 2019-11-28 3:25 AM:
>>
>> John,
>>
>> On 28.11.19 01:40, John Whelan wrote:
>>
>> Is there any reason why name:en could not be used?
>>
>> The country's official language requires a "non-standard" font to be
>> available which does not seem to be a given on all platforms. Like if
>> you set up a standard tile server and don't install extra fonts you will
>> see little squares instead of place names all over China.
>>
>> Apparently not all applications are as good in name:xx handling as
>> OsmAnd. A recurring point in the discussion is that the proponents of
>> using the official language say "we shouldn't fall back to English name
>> tags just because some apps/web sites are broken, we should file bug
>> reports with them instead", and the proponents of using English say
>> "let's be pragmatic, there's no way all these apps/sites will be fixed
>> within a short time, so we should use English".
>>
>> Bye
>> Frederik
>>
>>
>>
>> --
>> Sent from Postbox <https://www.postbox-inc.com>
>> _______________________________________________
>> HOT mailing list
>> HOT@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/hot
>>
> _______________________________________________
> HOT mailing list
> HOT@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/hot
>

_______________________________________________
HOT mailing list
HOT@openstreetmap.org
https://lists.openstreetmap.org/listinfo/hot

Re: [HOT] Name tag in non-latin script - hindrance for NGOs/aid agencies?

Reply via email to