Re: CLDR [terminating]

2018-09-04 Thread Marcel Schneider via Unicode
Sorry for not noticing that this thread belongs to CLDR-users, not to Unicode 
Public.
Hence I’m taking it off this list, welcoming participants to follow up there:

https://unicode.org/pipermail/cldr-users/2018-September/000833.html



Re: CLDR

2018-09-04 Thread James Kass via Unicode
(This is the response from Janusz S. Bień which was sent to the public list.)

On Mon, Sep 03 2018 at  1:03 -0800, James Kass wrote:

> Janusz S. Bień wrote,
>
>> Thanks for the link. I found especially interesting the Polish section
>> in
>>
>> https://www.unicode.org/cldr/charts/34/subdivisionNames/other_indo_european.html
>>
>> Looks like a complete rubbish, e.g.
>>
>> plmp = Federal Capital Territory(???) = Pomerania (Latin/English name of
>> Pomorze) transliterated into the Greek alphabet (and something in
>> Arabic).
>
> And nothing in Armenian, Albanian, or Pashto.
>
> If you click on the link at "plpm", it takes you right back to that
> same entry on that same page, which doesn't seem very helpful.
>
>> The header of the page says "The coverage depends on the availability of
>> data in wikidata for these names" but I was unable to find this rubbish
>> in Wikidata (but I was not looking very hard).
>
> I tried both "plpm" and "Πομερανία" in the Wikidata search box.  On
> the latter, there were some pages which looked to translate place
> names into various languages, for both Germany and Poland.  I couldn't
> find the exact page, but it would be something like this page:
>
> https://www.wikidata.org/wiki/Q54180
>
> (Clicking "All Entered Languages" on that page gives a lengthy list.)

Thanks! Most data about Poland at

https://www.wikidata.org/wiki/Q36

seem to make sense, but I don't think anybody is using abbreviation like
 "plpm" (for Pomorze/Pomerania).

>
 > and we really
 > need to go through the data and correct the many many errors, please.
>>
>> But who is the right person or institution to do it?
>
> If the CLDR information is driven by Wikidata as the file header
> indicates, then Wikidata.

I hope not all CLDR data are driven by Wikidata...

On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote:

> On 03/09/18 09:53 Janusz S. Bień via Unicode wrote:
 [...]
>> > These comments are designed for the Code Charts and as such must not be
>> > disproportionate in exhaustivity. Eg we have lists of related languages 
>> > ending
>> > in an ellipsis.
>>
>> Looks like we have different comments in mind.
>
> Then I’m sorry to be off-topic.

Let's say off the original topic. My primary concern is to preserve
 somehow such comments as e.g. the one on the bottom of page 14 of

https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf

>
> […]
>> >> > and we really
>> >> > need to go through the data and correct the many many errors, please.
>>
>> But who is the right person or institution to do it?
>
> Software vendors are committed to care for the data, and may delegate survey
> to service providers specialized in localization. Then I think that public 
> language
> offices should be among the reviewers. Beyond, and especially by lack of the
> latter, anybody is welcome to contribute as a guest. (Guest votes are 1 and 
> don’t
> add one to another.) That is consistent with the fact that Unicode relies on
> volunteers, too.
>
> I’m volunteering to personally welcome you to contribute to CLDR.

Thanks. The interesting question is who is/was already contributing from
 Poland or about Polish language. I vaguely remember a post with this
 information, but at that time I was not interested enough to take a
 note.

>
> […]
>> > Further you will see that while Polish is using apostrophe
>> > https://slowodnia.tumblr.com/post/136492530255/the-use-of-apostrophe-in-polish
>> > CLDR does not have the correct apostrophe for Polish, as opposed eg to 
>> > French.
>>
>> I understand that by "the correct apostrophe" you mean U+2019 RIGHT
>> SINGLE QUOTATION MARK.
>
> Yes.
>
>>
>> > You may wish to note that from now on, both U+0027 APOSTROPHE and
>> > U+0022 QUOTATION MARK are ruled out in almost all locales, given the
>> > preferred characters in publishing are U+2019 and, for Polish, the U+201E 
>> > and
>> > U+201D that are already found in CLDR pl.
[...]
> It’s a bit confusing because there is a column for English and a column for 
> Polish.
> The characters you retrieved are actually in the English column, while Polish 
> has
> consistently with By-Type, these quotation marks:
> ' " ” „ « »
> Hence the set is incomplete.

You are right, thanks. But was is the practical importance of it?
 I noticed that sometimes in Emacs 'forward-word" behaves strangely on a
text with unusual characters, but had no motivation to investigate how
 this is related to the current locale.

>>
>> >
>> > Note however that according to the information provided by English 
>> > Wikipedia:
>> > https://en.wikipedia.org/wiki/Quotation_mark#Polish
>> > Polish also uses single quotes, that by contrast are still missing in CLDR.
>>
>> You are right, but who cares? Looks like this has no practical
>> importance. Nobody complains about the wrong use of quotation marks in
>> Polish by Word or OpenOffice, so looks like the software doesn't use
>> this information. So this is rather a matter of aesthetics...
>
> I’ve come to the