Re: Counting Devanagari Aksharas

2017-04-23 Thread Richard Wordingham via Unicode
On Sun, 23 Apr 2017 05:40:29 +0300
Eli Zaretskii via Unicode  wrote:

> > The cursor moves to the cluster boundary, so there is much less of a
> > problem with Emacs.  
> 
> But you wanted to highlight only part of the cluster, AFAIU.

If I search for CGJ, highlighting it is frequently supremely useless.
I want to know where it is; highlighting is merely a tool to find it on
the screen.

Richard.


Re: Counting Devanagari Aksharas

2017-04-23 Thread Naena Guru via Unicode
The Unicode approach to Sanskrit and all Indic is flawed. Indic should 
not be letter-assembly systems.


Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of the 
speech. Each writing system then assigns a shape to the phonetically 
precise phoneme.


The most technically and grammatically proper solution for Indic is 
first to ROMANIZE the group of writing systems at the level of phonemes. 
That is, assign romanized shapes to vowels, consonants, prenasals, 
post-vowel phonemes (anusvara and visarjaniiya with its allophones) etc. 
This approach is similar to how European languages picked up Latin, 
improvised the script and even uses Simples and Capitals repertoire. 
Romanizing immediately makes typing easier and eliminates sometimes 
embarrassing ambiguity in Anglicizing -- you type phonetically on key 
layouts close to QWERTY. (Only four positions are different in Romanized 
Sinhala layout).


If we drop the capitalizing rules and utilize caps to indicate the 
'other' forms of a common letter, we get an intuitively typed system for 
each language, and readable too. When this is done carefully, comparing 
phoneme sets of the languages, we can reach a common set of 
Latin-derived SINGLE-BYTE letters completely covering all phonemes of 
all Indic.


Next, each native script can be obtained by making orthographic smart 
fonts that display the SBCS codes in the respective shapes of the native 
scripts.


I have successfully romanized Sinhala and revived the full repertoire of 
Sinhla + Sanskrit orthography losing nothing. Sinhala script is perhaps 
the most complex of all Indic because it is used to write both Sanskrit 
and Pali.


See this: http://ahangama.com/ (It's all SBCS underneath).
Test here: http://ahangama.com/edit.htm


On 4/20/2017 5:05 AM, Richard Wordingham via Unicode wrote:

Is there consensus on how to count aksharas in the Devanagari script?
The doubts I have relate to a visible halant in orthographic syllables
other than the first.

For example, according to 'Devanagari VIP Team Issues Report'
http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a
derived form from Nepali श्रीमान्  should be written श्रीमान्‌को
 and not
श्रीमान्को  .  Now, if the font used has a conjunct for
SHRA, I would count the former as having 4 aksharas SH.RII, MAA, N, KO
and the latter as having 3 aksharas SH.RII, MAA, N.KO.

If the font leads to the use of a visible halant instead of the vattu
conjunct SH.RA, as happens when I view this email, would there then be
5 and 4 aksharas respectively?  A further complication is that the font
chosen treats what looks like SH, RA as a conjunct; the vowel I appears
to the left of SH when added after RA (श्रि).

Richard.





Re: Counting Devanagari Aksharas

2017-04-23 Thread Asmus Freytag via Unicode

  
  
On 4/22/2017 9:25 PM, Manish Goregaokar
  via Unicode wrote:


  Backspace in browsers (chrome and firefox) deletes within EGCs too.
They delete matras in devanagari, and jamos in hangul. They don't
*exactly* work off of code points (e.g. flag emoji gets deleted as a
whole in many backspace implementations)

Flag emoji and many other "invisible" sequences
are different from ligatures and conjuncts in one important way:
their elements are not usually key strokes, but the full
sequence would be inserted from a pick list or other type of
input method. If you didn't "type" each of the elements of the
sequence, then deleting individual ones is something you would
only need for debugging or other specialized purposes, not for
undoing a physical action (keystroke) in reverse order.
Speaking of undoing: not all editors always
support full key-stroke by key-stroke undo, some will coalesce
longer runs of text. This saves on space for the undo buffer,
but also makes undoing more extensive edits less painful. It's
clearly a personal preference whether such "streamlining" would
feel "right" or "bothersome".
Beyond the last line typed, or two, I may really
not care if undo went word by word, say.
A./