Re: [interface-discuss] rtl::OUString::iterateCodePoints

Stephan Bergmann Wed, 30 May 2007 07:26:45 -0700

Eike Rathke wrote:

Hi Stephan,


On Wednesday, 2007-05-09 14:10:35 +0200, Stephan Bergmann wrote:

1 Would there be legitimate use cases for rtl_uString_iterateCodePointsto adjust an incoming index that points into the middle of a surrogatepair, or would that only hide broken code?


I think that in the current state it would more hide broken code than
being useful. Instead, other functions like those mentioned in i76869
could be introduced, if synchronization is needed. On the other hand,
especially finding the start of a code point may be useful when
iterating backwards from the end of the string and a surrogate is the
last two code units. Maybe that's a special case?


  sal_Int32 i = s.getLength();
  s.iterateCodePoints(&i, -1);

will make i point to the start of the last character (if s is nonempty).

2 With the current setup where moving past the beginning or end of thestring is undefined behavior, is there any use forpostIncrementCodePoints outside [-1 .. 1]?
There may be in scenarios like "next I'll be interested in the character
after the next", so postIncrementCodePoints would be 2.

My point was that you can only safely make that call if you know thatthere are at least two more code points after the current index, whichin general you can only know if you inspect the "surrogate structure" ofthe OUString at the sal_Unicode level (which iterateCodePoints shouldshield you from). (Whether you can safely make a call withpostIncrementCodePoints in [-1 .. 1] is easily checkable by the caller,on the other hand.)

Or would there be legitimateuse cases for rtl_uString_iterateCodePoints to stop moving past thebeginning/end of the string when postIncrementCodePoints is too large?
I think it should stop if it is called with indexUtf16 being "outside"
the string, or resulting in such a value, so -1 and length would be the
min/max resulting values. Also,


Why -1 instead of 0?

| @param postIncrementCodePoints
| the number of code points to move the given indexUtf16; can be negative.
| The value must be such that the resulting UTF-16 based index is in the
| range from zero to the length of this string (in UTF-16 code units),
| inclusive.

leaves the impression that in

sal_Int32 nIndex = str.getLength() - 1;
str.iterateCodePoints( &nIndex, 2 )

the value of postIncrementCodePoints would be invalid because it would
increment nIndex beyond the length. Instead, the function should limit
nIndex to str.getLength() upon return.

The nice thing about having it undefined behavior for now is that ifthere ever turns up demand to do clip excessive moves at 0 resp. length,then that can easily be implemented as a backwards compatible change.


-Stephan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [interface-discuss] rtl::OUString::iterateCodePoints

Reply via email to