On 13 Mar 2017, at 17:55, J Decker <d3c...@gmail.com> wrote:
> 
> I liked the Go implementation of character type - a rune type - which is a 
> codepoint.  and strings that return runes from by index.
> https://blog.golang.org/strings

IMO, returning code points by index is a mistake.  It over-emphasises the 
importance of the code point, which helps to continue the notion in some 
developers’ minds that code points are somehow “characters”.  It also leads to 
people unnecessarily using UCS-4 as an internal representation, which seems to 
have very few advantages in practice over UTF-16.

> Doesn't solve the problem for composited codepoints though... 
> 
> texel looks to be defined as a graphic element already.  TEXture ELement.

Yes, but I thought the proposal was “textel”, with the extra “t”.  Re-using 
“texel” would be quite inappropriate; there are certainly people who work on 
rendering software who would strongly object to that, for very good reasons.

I would caution, however, that there’s already a lot of terminology associated 
with Unicode, perhaps for understandable reasons, but if the word “textel” is 
going to have a definition that differs from (say) an extended grapheme 
cluster, I think a great deal of consideration should be given to what exactly 
that definition should be.  We already have “characters”, code units, code 
points, combining sequences, graphemes, grapheme clusters, extended grapheme 
clusters and probably other things I’ve missed off that list.  Merely adding 
yet another bit of terminology isn’t going to fix the problem of developers 
misunderstanding or simply not being aware of the correct terminology or of 
some aspect of Unicode’s behaviour.

Kind regards,

Alastair.

--
http://alastairs-place.net


Reply via email to