On Tuesday 27 November 2012, Philippe Verdy <verd...@wanadoo.fr> wrote:

> This is not complicate to parse it in the foreward direction, but for the 
> backward direction, it means that when you see the final low surrogate, you 
> still need to rollback to the previous position: it can only be a leading 
> high surrogate of the BMP, **or** (this is would be new) another low 
> surrogate encoding, for which you must still get back to get the leading high 
> surrogate. This requires a test if starting from a random position, but at 
> least it remains possible to know where is the leading high surrogate.

Ah!

Does that mean that similar, or maybe worse, problems could arise with parsing 
in a reverse direction of the idea in the following post?

http://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0307.html

A possible solution, should the need arise for more planes, would be to define 
one code point in plane 0 to have a meaning such as EXTRA SURROGATE and then 
one could have a sequence such as

HIGH SURROGATE, EXTRA SURROGATE, LOW SURROGATE

That mechanism could provide another 16 planes at a cost of one plane 0 code 
point.

Would that be a satisfactory solution to the problem?

William Overington

27 November 2012



Reply via email to