Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-23 Thread Anshuman Pandey via Unicode


> On Jul 23, 2019, at 12:26 AM, Richard Wordingham via Unicode 
>  wrote:
> 
> On Mon, 22 Jul 2019 17:42:37 -0700
> Anshuman Pandey via Unicode  wrote:
> 
>> As I pointed out in L2/11-144, the “Magar Akkha” script is an
>> appropriation of Brahmi, renamed to link it to the primordialist
>> daydreams of an ethno-linguistic community in Nepal. I have never
>> seen actual usage of the script by Magars. If things have changed
>> since 2011, I would very much welcome such information. Otherwise,
>> the so-called “Magar Akkha” is not suitable for encoding. The Brahmi
>> encoding that we have should suffice.
> 
> How would mere usage qualify it as a separate script?

I apologize for using the wrong conjunction. Instead of “otherwise” I should 
have written “nevertheless”.

All my best,
Anshu




Re: Akkha script (used by Eastern Magar language) in ISO 15924?

2019-07-22 Thread Anshuman Pandey via Unicode
As I pointed out in L2/11-144, the “Magar Akkha” script is an appropriation of 
Brahmi, renamed to link it to the primordialist daydreams of an 
ethno-linguistic community in Nepal. I have never seen actual usage of the 
script by Magars. If things have changed since 2011, I would very much welcome 
such information. Otherwise, the so-called “Magar Akkha” is not suitable for 
encoding. The Brahmi encoding that we have should suffice.

All my best,
Anshu

> On Jul 22, 2019, at 10:06 AM, Lorna Evans via Unicode  
> wrote:
> 
> Also: https://scriptsource.org/scr/Qabl
> 
> 
>> On Mon, Jul 22, 2019, 12:47 PM Ken Whistler via Unicode 
>>  wrote:
>> See the entry for "Magar Akkha" on:
>> 
>> http://linguistics.berkeley.edu/sei/scripts-not-encoded.html
>> 
>> Anshuman Pandey did preliminary research on this in 2011.
>> 
>> http://www.unicode.org/L2/L2011/11144-magar-akkha.pdf
>> 
>> It would be premature to assign an ISO 15924 script code, pending the 
>> research to determine whether this script should be separately encoded.
>> 
>> --Ken
>> 
>>> On 7/22/2019 9:16 AM, Philippe Verdy via Unicode wrote:
>>> According to Ethnolog, the Eastern Magar language (mgp) is written in two 
>>> scripts: Devanagari and "Akkha".
>>> 
>>> But the "Akkha" script does not seem to have any ISO 15924 code.
>>> 
>>> The Ethnologue currently assigns a private use code (Qabl) for this script.
>>> 
>>> Was the addition delayed due to lack of evidence (even if this language is 
>>> official in Nepal and India) ?
>>> 
>>> Did the editors of Ethnologue submit an addition request for that script 
>>> (e.g. for the code "Akkh" or "Akha" ?)
>>> 
>>> Or is it considered unified with another script that could explain why it 
>>> is not coded ? If this is a variant it could have its own code (like 
>>> Nastaliq in Arabic). Or may be this is just a subset of another 
>>> (Sino-Tibetan) script ?
>>> 
>>> 
>>> 


Fwd: L2/18-181

2018-05-16 Thread Anshuman Pandey via Unicode
> On May 16, 2018, at 3:46 PM, Doug Ewell via Unicode  
> wrote:
>
> http://www.unicode.org/L2/L2018/18181-n4947-assamese.pdf
>
> This is a fascinating proposal to disunify the Assamese script from
> Bengali on the following bases:

‘Fascinating’ is a not a term I’d use for this proposal.

If folks are interested in a valid proposal for disunification of
Bengali, please look at the proposal for Tirhuta.

> 1. The identity of Assamese as a script distinct from Bengali is in
> jeopardy.

This is not a technical matter. Moreover, its typical rhetoric used by
various language communities in South Asia. Fairly standard fare for
those familiar with such issues.

The proposal needs to show how the two scripts differ, ie. conjuncts,
CV ligatures, etc. The number forms are similar to those already
encoded. Again, cf. Tirhuta.

> 2. Collation is different between the Assamese and Bengali languages,
> and code point order should reflect collation order.

The same issue applies to dictionary order for Hindi, Marathi, which
differ from the conventional Sanskrit order for Devanagari.
Orthographies for various languages put conjuncts and other things at
the end, which are not considered atomic letters. Nothing special in
this regard for Assamese and Bengali.

> 3. Keyboard design is more difficult because consonants like ক্ষ
> are encoded as conjunct forms instead of atomic characters.

Ignorant question on my part: is it difficult to use character
sequences as labels for keys? I see keys for both क्ष and ज्ञ on the
iOS Hindi keyboard, and त्र is tucked away under त.

> 4. The use of a single encoded script to write two languages forces
> users to use language identifiers to identify the language.

Same applies to each of the 40+ varieties of Hindi, as well as
Marathi, etc. Another ignorant question: how to identify the various
languages that use Arabic and Cyrillic?

> 5. Transliteration of Assamese into a different script is problematic
> because letters have different phonological value in Assamese and
> Bengali.

Transliteration or transcription? In any case, this applies to other
languages written using similar scripts: a Marathi speaker pronounces
ज and ऋ differently than a Hindi speaker does.

> It will be interesting to see where this proposal goes.

Hopefully, it does not go too far. What it proposes is contrary to
Unicode and redundant.

> Given that all
> or most of these issues can be claimed for English, French, German,
> Spanish, and hundreds of other languages written in the Latin script, if
> the Assamese proposal is approved we can expect similar disunification
> of the Latin script into language-specific alphabets in the future.

Fascinating. I mean, terrible.

All my best,
Anshuman



Re: 0027, 02BC, 2019, or a new character?

2018-02-20 Thread Anshuman Pandey via Unicode


> On Feb 20, 2018, at 9:49 PM, James Kass via Unicode  
> wrote:
> 
> Michael Everson wrote:
> 
>> Orthographic harmonization between these languages can ONLY help any
>> speaker of one to access information in any of the others. That expands
>> people’s worlds. That would be a good goal.
> 
> Wouldn't dream of arguing with that.  Expanding people's worlds is why
> many of us have supported Unicode.

Agreed!

> The good news is that the thread title question is moot.

Yes, now let’s please return to discussing emoji.

All my best,
Anshu


End of discussion, please — Re: Why so much emoji nonsense?

2018-02-15 Thread Anshuman Pandey via Unicode


> On Feb 15, 2018, at 10:58 PM, Pierpaolo Bernardi via Unicode 
>  wrote:
> 
> On Fri, Feb 16, 2018 at 4:26 AM, James Kass via Unicode
>  wrote:
> 
>> The best time to argue against the addition of emoji to Unicode would be
>> 2007 or 2008, but you'd be wasting your time travel.  Trust me.
> 
> But it's always a good time to argue against the addition of more
> nonsense to what we already have got.

I think it’s a good time to end this conversation. Whether ‘nonsense’ or not, 
emoji are here and they’re in Unicode. This conversation has itself become 
nonsense, d’y’all agree?

The amount of time that people have spent on this discussion could’ve been 
directed towards work on any one of the unencoded scripts listed at:

 http://www.linguistics.berkeley.edu/sei/scripts-not-encoded.html

As many have noted during this discussion, the emoji “ship has already sailed”. 
I’d’ve jumped aboard sooner, but this metaphor is now also quite tired. 

All my best,
Anshu



Re: Emoji for major planets at least?

2018-01-18 Thread Anshuman Pandey via Unicode
Proposals for planet emoji were submitted in April 2017:

https://www.unicode.org/L2/L2017/17100-planet-emoji-seq.pdf

http://www.unicode.org/L2/L2017/17100r-planet-emoji-seq.pdf

I’m not sure what the result was.

Anshu


> On Jan 18, 2018, at 12:46 PM, Asmus Freytag (c) via Unicode 
>  wrote:
> 
>> On 1/18/2018 10:01 AM, John H. Jenkins wrote:
>> Well, you can go with Venus = white planet, Mercury = grey planet, Uranus = 
>> greenish planet, Neptune = bluish planet, Jupiter = striped planet.
>> 
>> As you say, though, without a context, none of them convey much and Venus, 
>> at least, would just be a circle. 
>> 
>> Plus there's the question of the context in which someone would want to send 
>> little pictures of the planets. This sounds like it would be adding emoji 
>> just because.
> 
> "Earth" as in "a blue ball in space" is something that reached iconic status 
> after the famous photo taken during the early Apollo missions. I could 
> definitely see that used in a variety of possible contexts. And the 
> recognition value is higher than for many recent emoji.
> 
> Saturn, with its rings (even though it's no longer the only one known with 
> rings) also is iconic and highly recognizable. I lack imagination as to when 
> someone would want to use it in communication, but I have the same issue with 
> quite a few recent emoji, some of which are far less iconic or recognizable. 
> I think it does lend itself to describe a "non-earth" type planet, or even 
> the generic idea of a planet (as opposed to a star/sun).
> 
> Mars and Venus have tons of connotations, which could be expressed by using 
> an emoji (as opposed to the astrological symbol for each), but only Mars is 
> reasonably recognizable without lots of pre-established context. That red 
> color.
> 
> In a detailed enough rendering, Jupiter, as a shaded "ball" with stripes and 
> red dot would more recognizable than any of the remaining planets (on par or 
> better with many recent emoji), but I see even less scope for using it 
> metaphorically or in extended contexts.
> 
> If someone were to make a proposal, I would suggest to them to limit it to 
> these four and to provide more of a suggestion as to how these might show up 
> in use.
> 
> A./
>> 
>>> On Jan 18, 2018, at 10:44 AM, Asmus Freytag via Unicode 
>>>  wrote:
>>> 
 On 1/18/2018 6:55 AM, Shriramana Sharma via Unicode wrote:
 Hello people.
 
 We have sun, earth and moon emoji (3 for the earth and more for the
 moon's phases). But we don't have emoji for the rest of the planets.
 
 We have astrological symbols for all the planets and a few
 non-existent imaginary "planets" as well.
 
 Given this, would it be impractical to encode proper emoji characters
 for the rest of the planets, at least the major ones whose physical
 characteristics are well known and identifiable?
 
 I mean for example identifying Sedna and Quaoar
 (https://en.wikipedia.org/wiki/File:EightTNOs.png) is probably not
 going to be practical for all those other than astronomy buffs but the
 physical shapes of the major planets are known to all high school
 students…
 
>>> Earth = blue planet (with clouds)
>>> 
>>> Mars = red planet
>>> 
>>> Saturn = planet with rings
>>> 
>>> I don't think any of the other ones are identifiable in a context-free 
>>> setting, unless you draw a "big planet with red dot" for Jupiter.
>>> 
>>> Earth would have to be depicted in a way that doesn't focus on 
>>> "hemispheres", or you miss the idea of it as "planet".
>>> 
>>> 
>>> 
>>> A./
>>> 
>>> 
>>> 
>> 
> 


The need for a basic register of emoji submissions

2017-08-31 Thread Anshuman Pandey via Unicode
There is a need for a basic register of proposals that have been
submitted to the Emoji Subcommittee. Currently, emoji proposals are
posted to the UTC register after they have been reviewed by the ESC as
being actionable by the UTC. For proposals that make the cut, some
time can pass between the date of submission and the date they are
posted. For proposals that are deemed unsuitable, there is simply no
public record.

Consequently, there is no way to know if a particular emoji has been
proposed, either while a submitted proposal is being reviewed or if a
proposal has been rejected. The "Submitting Emoji Proposals" page at
http://unicode.org/emoji/selection.html quixotically notifies the
reader using bold face to "check the Emoji List to make sure your
proposal is new": this list contains emoji that have already been
encoded.

This is a problem. There have been three instances where I have worked
on emoji proposals only to later learn that they were already proposed
earlier. And I learned that only because I check the UTC register
frequently for my script encoding efforts. If there were a basic
register of emoji submissions, I could have easily checked it and
saved the hours I spent in drawing up documents.

The de facto rationale for not posting emoji proposals to the UTC
register right away is that 'there are too many proposals that are
unactionable or of insufficient quality'. But, I think this rationale
does not hold water too well. A basic task of a standards subcommittee
is to maintain a list of artifacts that pertain to its function. For
the ESC, these artifacts include all emoji submissions. And a list of
these artifacts can easily be made available at
http://unicode.org/emoji. So, that instead of pointing prospective
emoji proposal authors to a list of already encoded emoji, they can be
pointed to a list of emoji submissions.

This basic register can be as simple as a list of names. If the ESC
wishes to not post other details, that is fine. I am not asking for a
Roadmap.

I see from the announcement made yesterday that the ESC now has (at
least) four members. Congratulations to the new members, who I believe
to be highly capable of maintaining a simple public list of emoji
submissions in short time.

All my best,
Anshu


Re: Comparing Raw Values of the Age Property

2017-05-22 Thread Anshuman Pandey via Unicode
I performed several operations on DerivedAge.txt a few months ago. One basic 
example here:

https://pandey.github.io/posts/unicode-growth-UCD-python.html

If you provide some more insight into your objective, I might be able to help.

I would recommend against relying on the order of the data, and that you 
instead parse the individual entries to obtain the 'Age' property.

All my best,
Anshu


> On May 22, 2017, at 4:44 PM, Richard Wordingham via Unicode 
>  wrote:
> 
> Given two raw values of the Age property, defined in UCD file
> DerivedAge.txt, how is a computer program supposed to compare them?
> Apart from special handling for the value "Unassigned" and its short
> alias "NA", one used to be able to compare short values against short
> values and long values against long values by simple string
> comparison.  However, now we are coming to Version 10.0 of Unicode,
> this no longer works - "1.1" < "10.0" < "2.0".
> 
> There are some possibilities - the values appear in order in
> PropertyValueAliases.txt and in DerivedAge.txt.  However, I can find no
> relevant guarantees in UAX#44.  I am looking for a solution that can be
> driven by the data files, rather than requiring human thought at every
> version release.  Can one rely on the FULL STOP being the field
> divider, and can one rely on there never being any grouping characters
> in the short values?  Again, I could find no guarantees.
> 
> Richard.


Re: Counting Devanagari Aksharas

2017-04-20 Thread Anshuman Pandey via Unicode

> On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode 
>  wrote:
> 
> On Thu, 20 Apr 2017 14:14:00 -0700
> Manish Goregaokar via Unicode  wrote:
> 
>> On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode
>>  wrote:
> 
>>> On Thu, 20 Apr 2017 11:17:05 -0700
>>> Manish Goregaokar via Unicode  wrote:
> 
 I'm of the opinion that Unicode should start considering devanagari
 (and possibly other indic) consonant clusters as single extended
 grapheme clusters.
> 
>>> You won't like it if cursor movement granularity is reduced to one
>>> extended grapheme cluster.  I'm grateful that Emacs allows me to
> 
>> I mean, we do the same for Hangul.
> 
> Hangul is generally a maximum of three characters, which is about the
> border of tolerance. I find it irritating to have to completely retype
> Thai grapheme clusters of consonant, vowel and tone mark.  There were
> loud protests from the Thais when preposed vowels were added to the
> Thai grapheme cluster and implementations then responded, and Unicode
> quickly removed them. Now imagine you're typing Vedic Sanskrit, with its
> clusters and pitch indicators.

I tried typing Vedic Sanskrit, and it seems to work:

http://pandey.pythonanywhere.com/devsyll

Haven't tried the orthographic oddity of the Nepali case in question. Above my 
pay grade.

If you access the above link on an iOS device you'll see tofu and missing 
characters. Apple's Devanagari font needs to be fixed.

- AP