On Tuesday 27 November 2012, Philippe Verdy <verd...@wanadoo.fr> wrote:
> This is not complicate to parse it in the foreward direction, but for the > backward direction, it means that when you see the final low surrogate, you > still need to rollback to the previous position: it can only be a leading > high surrogate of the BMP, **or** (this is would be new) another low > surrogate encoding, for which you must still get back to get the leading high > surrogate. This requires a test if starting from a random position, but at > least it remains possible to know where is the leading high surrogate. Ah! Does that mean that similar, or maybe worse, problems could arise with parsing in a reverse direction of the idea in the following post? http://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0307.html A possible solution, should the need arise for more planes, would be to define one code point in plane 0 to have a meaning such as EXTRA SURROGATE and then one could have a sequence such as HIGH SURROGATE, EXTRA SURROGATE, LOW SURROGATE That mechanism could provide another 16 planes at a cost of one plane 0 code point. Would that be a satisfactory solution to the problem? William Overington 27 November 2012