Re: abstract characters, semantics, meaningful transformations ... Was: Tibetan Paluta

Alastair Houghton via Unicode Mon, 01 May 2017 08:33:13 -0700

On 1 May 2017, at 15:19, Naena Guru via Unicode <unicode@unicode.org> wrote:
> 
> This whole attempt to make digitizing Indic script some esoteric, 'abstract', 
> 'semantic representation' and so on seems to me is an attempt to make Unicode 
> the realm of the some super humans.


No.  It’s important so that the standard Unicode algorithms function acceptably 
for Indic languages.  The design of Unicode is such that, compatibility 
characters and other some special cases aside, it encodes semantics as opposed 
to graphic representations.

> The purpose of writing is to represent speech.

Yes, and Unicode is intended to give us a representation of speech *that is 
amenable to machine processing*.

The other extreme is what used to happen on many Chinese and Japanese websites, 
namely “representing speech” by way of an image - if you want to process the 
text in one of those images, well, good luck with that (you’ll want to start 
with some kind of OCR).

Perhaps part of the problem here is that Unicode sits at the intersection 
between linguistics and software engineering; the discussion of both sides of 
this is likely to be quite technical, some of the vocabulary used might well 
seem like “mumbo jumbo”, just as some of the design decisions might not make 
sense if your expertise is mainly on one side or mainly on the other (or, for 
that matter, if you have little exposure to other languages or the challenges 
inherent in encoding or rendering them).  However, for all that it might 
*sound* like “mumbo jumbo” to you, it is not.

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: abstract characters, semantics, meaningful transformations ... Was: Tibetan Paluta

Reply via email to