> Do Hindi speakers really think of orthographic syllables as characters?
When rendered as a cluster, yes? I've asked around, and folks seem to insist on coupling it to the rendering. Given most fonts render *normal* (common, etc) clusters, I think making them EGCs and looking at nonrendered clusters the same way we do family emoji is fine (family emojis of length 5 are a single EGC, but that's not what's actually perceived by the user, but it's a use case that's very rare in the wild, so it doesn't matter). The way I see it, the current system is wrong, and so would the proposed system of not breaking at viramas (or not breaking at viramas followed by a consonant if we want to be more precise), but the proposed system would be wrong much less often. I am only talking about Devanagari, though scripts like Bangla/Gujrati/Gurmukhi may have similar needs. Breaking on ZWNJ seems sensible. -Manish On Fri, Apr 21, 2017 at 4:04 PM, Richard Wordingham via Unicode <unicode@unicode.org> wrote: > On Thu, 20 Apr 2017 11:17:05 -0700 > Manish Goregaokar via Unicode <unicode@unicode.org> wrote: > >> On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode >> <unicode@unicode.org> wrote: > >> > Is there consensus on how to count aksharas in the Devanagari >> > script? The doubts I have relate to a visible halant in >> > orthographic syllables other than the first. > >> I don't think there's consensus. > > I've found related discussion at > https://lists.w3.org/Archives/Public/public-i18n-indic/. The question > of how to count was raised and not answered there. > >> On Wed, Apr 19, 2017 at 4:35 PM, >> Richard Wordingham via Unicode <unicode@unicode.org> wrote: >> > Is there consensus on how to count aksharas in the Devanagari >> > script? The doubts I have relate to a visible halant in >> > orthographic syllables other than the first. > >> I'm of the opinion that Unicode should start considering devanagari >> (and possibly other indic) consonant clusters as single extended >> grapheme clusters. > > Do Hindi speakers really think of orthographic syllables as characters? > > What may be useful is the concept of a definition of an orthographic > syllable. It may be possible to get the information from a font - > depending on the renderer - but a locale-dependent definition should be > possible for use as a fall-back. Devanagari rules won't work for > Tamil, and I think rules for Hindi and Nepali will be slightly > different - <VIRAMA, ZWNJ> looks like a problem. > > The concept is possibly not useful in some Indic scripts - the concept > won't work well in Thai, but will work in Pali in the Thai script, for > both Pali orthographies. > > Richard.