Re: Counting Devanagari Aksharas

2017-04-26 Thread Eli Zaretskii via Unicode
> Date: Wed, 26 Apr 2017 07:45:07 +0100 > From: Richard Wordingham via Unicode > > On Wed, 26 Apr 2017 08:48:13 +0300 > Eli Zaretskii via Unicode wrote: > > > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > > From: Richard Wordingham

Re: Counting Devanagari Aksharas

2017-04-26 Thread Richard Wordingham via Unicode
On Wed, 26 Apr 2017 08:48:13 +0300 Eli Zaretskii via Unicode wrote: > > Date: Sun, 23 Apr 2017 22:59:49 +0100 > > From: Richard Wordingham > > Cc: Eli Zaretskii > > > > If I search for CGJ, highlighting it is frequently

Re: Counting Devanagari Aksharas

2017-04-25 Thread Eli Zaretskii via Unicode
> Date: Sun, 23 Apr 2017 22:59:49 +0100 > From: Richard Wordingham > Cc: Eli Zaretskii > > If I search for CGJ, highlighting it is frequently supremely useless. > I want to know where it is; highlighting is merely a tool to find it on > the screen.

Re: Go romanize! Re: Counting Devanagari Aksharas

2017-04-25 Thread Naena Guru via Unicode
Quote from below: The word indeed means 'danger' (Pali/Sanskrit _antarāya_). The pronunciation is /ʔontʰalaːi/; the Tai languages that use(d) the Tai Tham script no longer have /r/. The older sequence /tr/ normally became /tʰ/ (except in Lao), but the spelling has not been updated - at least,

Re: Go romanize! Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 20:53:12 +0530 Naena Guru via Unicode wrote: > Quote by Richard: > Unless this implies a spelling reform for many languages, I'd like to > see how this works for the Tai Tham script. I'm not happy with the > Romanisation I use to work round hostile

Go romanize! Re: Counting Devanagari Aksharas

2017-04-24 Thread Naena Guru via Unicode
Quote by Richard: Unless this implies a spelling reform for many languages, I'd like to see how this works for the Tai Tham script. I'm not happy with the Romanisation I use to work round hostile rendering engines. (My scheme is only documented in variable hack_ss02 in the last script blocks of

Re: Counting Devanagari Aksharas

2017-04-24 Thread Richard Wordingham via Unicode
On Mon, 24 Apr 2017 00:36:26 +0530 Naena Guru via Unicode wrote: > The Unicode approach to Sanskrit and all Indic is flawed. Indic > should not be letter-assembly systems. > > Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of > the speech. Each writing

Re: Counting Devanagari Aksharas

2017-04-23 Thread Richard Wordingham via Unicode
On Sun, 23 Apr 2017 05:40:29 +0300 Eli Zaretskii via Unicode wrote: > > The cursor moves to the cluster boundary, so there is much less of a > > problem with Emacs. > > But you wanted to highlight only part of the cluster, AFAIU. If I search for CGJ, highlighting it is

Re: Counting Devanagari Aksharas

2017-04-23 Thread Naena Guru via Unicode
The Unicode approach to Sanskrit and all Indic is flawed. Indic should not be letter-assembly systems. Sanskrit vyaakaraNa (grammar) explains the phonemes as the atoms of the speech. Each writing system then assigns a shape to the phonetically precise phoneme. The most technically and

Re: Counting Devanagari Aksharas

2017-04-23 Thread Asmus Freytag via Unicode
On 4/22/2017 9:25 PM, Manish Goregaokar via Unicode wrote: Backspace in browsers (chrome and firefox) deletes within EGCs too. They delete matras in devanagari, and jamos in hangul. They don't *exactly* work off of code points (e.g. flag emoji gets deleted as a

Re: Counting Devanagari Aksharas

2017-04-22 Thread Manish Goregaokar via Unicode
> You cannot even > meaningfully move by single characters in most clusters, because > composing characters generally completely changes how the original > characters looked, so there's nowhere you can display the cursor. Yes, and this is one of the reasons it feels broken in devanagari, you get

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sun, 23 Apr 2017 00:51:59 +0100 > Cc: Julian Bradfield > From: Richard Wordingham via Unicode > > On Sat, 22 Apr 2017 21:39:42 +0100 (BST) > Julian Bradfield via Unicode wrote: > > > On 2017-04-22, Eli Zaretskii via

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 21:39:42 +0100 (BST) Julian Bradfield via Unicode wrote: > On 2017-04-22, Eli Zaretskii via Unicode wrote: > > I could imagine Emacs decomposing characters temporarily when only > > part of a cluster matches the search string.

Re: Counting Devanagari Aksharas

2017-04-22 Thread Julian Bradfield via Unicode
On 2017-04-22, Eli Zaretskii via Unicode wrote: >> From: Richard Wordingham via Unicode [...] >> I've encountered the problem that, while at least I can search for >> text smaller than a cluster, there's no indication in the window of >> where in the

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sat, 22 Apr 2017 17:13:36 +0100 > From: Richard Wordingham via Unicode > > > Movement by grapheme > > cluster is AFAIK the most natural way of moving in complex scripts. > > Evidence? Personal experience? > It's easiest for displaying the cursor. It's the _only_

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Sat, 22 Apr 2017 13:34:32 +0300 Eli Zaretskii via Unicode wrote: > AFAIR, Emacs allows one to _delete_ individual characters, > i.e. Backspace and C-d delete character-by-character, so the problem > shouldn't be so grave for imperfect typists. Deleting forwards by one

Re: Counting Devanagari Aksharas

2017-04-22 Thread Eli Zaretskii via Unicode
> Date: Sat, 22 Apr 2017 11:13:16 +0100 > From: Richard Wordingham via Unicode > > At present these are split into two and three grapheme clusters > respectively, and LibreOffice cursor movement responds accordingly. > (SIGN AA starts a grapheme cluster in several scripts of

Re: Counting Devanagari Aksharas

2017-04-22 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 16:27:43 -0700 Manish Goregaokar via Unicode wrote: > > Do Hindi speakers really think of orthographic syllables as > > characters? > > When rendered as a cluster, yes? I've asked around, and folks seem to > insist on coupling it to the rendering.

Re: Counting Devanagari Aksharas

2017-04-21 Thread Manish Goregaokar via Unicode
> Do Hindi speakers really think of orthographic syllables as characters? When rendered as a cluster, yes? I've asked around, and folks seem to insist on coupling it to the rendering. Given most fonts render *normal* (common, etc) clusters, I think making them EGCs and looking at nonrendered

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode > wrote: > > Is there consensus on how to count aksharas in the Devanagari > > script? The doubts I have relate to

Re: Counting Devanagari Aksharas

2017-04-21 Thread Manish Goregaokar via Unicode
That seems like a relatively niche use case (especially with Vedic Sanskrit) compared to having weird selection for everything else. I'm not convinced. When I use a romanized Devanagari input method (I typically do on my laptop), deleting the whole cluster is necessary anyway for things to work

Re: Counting Devanagari Aksharas

2017-04-21 Thread Richard Wordingham via Unicode
On Fri, 21 Apr 2017 00:08:24 -0500 Anshuman Pandey via Unicode wrote: > > On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > > wrote: > > Now imagine you're > > typing Vedic Sanskrit, with its clusters and pitch indicators. > I tried

Re: Counting Devanagari Aksharas

2017-04-20 Thread Anshuman Pandey via Unicode
> On Apr 20, 2017, at 8:19 PM, Richard Wordingham via Unicode > wrote: > > On Thu, 20 Apr 2017 14:14:00 -0700 > Manish Goregaokar via Unicode wrote: > >> On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode >> wrote:

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 14:14:00 -0700 Manish Goregaokar via Unicode wrote: > On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode > wrote: > > On Thu, 20 Apr 2017 11:17:05 -0700 > > Manish Goregaokar via Unicode wrote: >

Re: Counting Devanagari Aksharas

2017-04-20 Thread Manish Goregaokar via Unicode
I mean, we do the same for Hangul. The main time you need intra-conjunct segmentation in Devanagari is when deleting something you just typed. And backspace usually operates on code points anyway (except for some weird cases like flag emoji, though this isn't uniform across platforms). I don't

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 11:17:05 -0700 Manish Goregaokar via Unicode wrote: > When given a rendered representation people seem to uniformly count > conjuncts as multiple aksharas if rendered with visible halant, and as > a single akshara if they are rendered conjoined. Now,

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
On Thu, 20 Apr 2017 15:33:37 +0530 Shriramana Sharma via Unicode wrote: > All I can say is that Tamil script has eschewed most consonant cluster > ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I > used ZWNJ) i.o. श्रीमान्को is quite possible with

Re: Counting Devanagari Aksharas

2017-04-20 Thread Manish Goregaokar via Unicode
I don't think there's consensus. When given a rendered representation people seem to uniformly count conjuncts as multiple aksharas if rendered with visible halant, and as a single akshara if they are rendered conjoined. Most fonts for devanagari these days are pretty good at conjoining

Re: Counting Devanagari Aksharas

2017-04-20 Thread Shriramana Sharma via Unicode
Hello Richard. Yes my earlier reply wasn't intended to be offlist. I have near-zero knowledge about non-Indic languages. All I can say is that Tamil script has eschewed most consonant cluster ligatures/conjoining forms. As for Devanagari, writing श्रीमान्‌को (I used ZWNJ) i.o. श्रीमान्को is quite

Re: Counting Devanagari Aksharas

2017-04-20 Thread Richard Wordingham via Unicode
I was offered the following reply: > To my knowledge except in Tamil script vowel less consonants in > written form aren't considered as separate "akshara"s in native > terminology. Word-finally they seem to be being treated as such. To be more precise, a final cluster of one or more consonants