Re: [Unicode] tablature characters for the Chinese guqin

2014-05-30 Thread suzuki toshiya
It seems that my first response to this discussion was not delivered because my attachment image was too big. I'm sorry, please let me post revised version... -- China National Body had ever reported that they had a plan to encode the character for the tablature, in IRG:

Re: [Unicode] tablature characters for the Chinese guqin

2014-05-30 Thread suzuki toshiya
and that we have a bunch of base characters, we easily reach 10 and more characters if all possible permutations are encoded, and this is certainly not what Unicode wants :-) Indeed. Some people may want to encode the tablature characters as precomposed glyphs in square metric, and unify

Re: [Unicode] tablature characters for the Chinese guqin

2014-05-30 Thread Andrew West
On 30 May 2014 05:50, suzuki toshiya mpsuz...@hiroshima-u.ac.jp wrote: BTW, a few (only one?) characters for the latter style are sampled in a normal dictionary CiYuan, and will be included in CJK Unified Ideograph Extension F. I hope not. Just because it occurs in a Chinese dictionary does

Corrigendum #9

2014-05-30 Thread Karl Williamson
I'm having a problem with this http://www.unicode.org/versions/corrigendum9.html Some people now think it means that noncharacters are really no different from private-use characters, and should be treated very similarly if not identically. It seems to me that they should be illegal in open

Unicode Regular Expressions, Surrogate Points and UTF-8

2014-05-30 Thread Richard Wordingham
Is there any good reason for UTS#18 'Unicode Regular Expressions' to express its requirements in terms of codepoints rather than scalar values? I was initially worried by RL1.1 requiring that one be able to specify surrogate codepoints in a pattern. It would not be compliant for an application

Re: Corrigendum #9

2014-05-30 Thread Asmus Freytag
On 5/30/2014 11:26 AM, Karl Williamson wrote: I'm having a problem with this http://www.unicode.org/versions/corrigendum9.html You are not alone. Some people now think it means that noncharacters are really no different from private-use characters, and should be treated very similarly if

Re: Block Boundaries (was: RE: Corrigendum #9)

2014-05-30 Thread Markus Scherer
In addition, the Block property is not particularly useful even in regular expressions or other processing. It is almost always more useful to use Script, Alphabetic, Unified_Ideograph, etc. Blocks help with planning and allocation but little else. markus

Re: Block Boundaries (was: RE: Corrigendum #9)

2014-05-30 Thread Richard Wordingham
On Fri, 30 May 2014 13:22:58 -0700 Markus Scherer markus@gmail.com wrote: In addition, the Block property is not particularly useful even in regular expressions or other processing. It is almost always more useful to use Script, Alphabetic, Unified_Ideograph, etc. Blocks help with

Re: Unicode Regular Expressions, Surrogate Points and UTF-8

2014-05-30 Thread Markus Scherer
If you use Unicode 16-bit strings, it's easy to pass through unpaired surrogates and treat them like code points; it's often not productive or necessary to check for them all the time, that is, to be strict about UTF-16. On the other hand, I don't think anyone expects you to support invalid