Eike Rathke wrote:
Hi Stephan,

On Wednesday, 2007-05-09 14:10:35 +0200, Stephan Bergmann wrote:

1 Would there be legitimate use cases for rtl_uString_iterateCodePoints to adjust an incoming index that points into the middle of a surrogate pair, or would that only hide broken code?

I think that in the current state it would more hide broken code than
being useful. Instead, other functions like those mentioned in i76869
could be introduced, if synchronization is needed. On the other hand,
especially finding the start of a code point may be useful when
iterating backwards from the end of the string and a surrogate is the
last two code units. Maybe that's a special case?

  sal_Int32 i = s.getLength();
  s.iterateCodePoints(&i, -1);

will make i point to the start of the last character (if s is nonempty).

2 With the current setup where moving past the beginning or end of the string is undefined behavior, is there any use for postIncrementCodePoints outside [-1 .. 1]?

There may be in scenarios like "next I'll be interested in the character
after the next", so postIncrementCodePoints would be 2.

My point was that you can only safely make that call if you know that there are at least two more code points after the current index, which in general you can only know if you inspect the "surrogate structure" of the OUString at the sal_Unicode level (which iterateCodePoints should shield you from). (Whether you can safely make a call with postIncrementCodePoints in [-1 .. 1] is easily checkable by the caller, on the other hand.)

Or would there be legitimate use cases for rtl_uString_iterateCodePoints to stop moving past the beginning/end of the string when postIncrementCodePoints is too large?

I think it should stop if it is called with indexUtf16 being "outside"
the string, or resulting in such a value, so -1 and length would be the
min/max resulting values. Also,

Why -1 instead of 0?

| @param postIncrementCodePoints
| the number of code points to move the given indexUtf16; can be negative.
| The value must be such that the resulting UTF-16 based index is in the
| range from zero to the length of this string (in UTF-16 code units),
| inclusive.

leaves the impression that in

sal_Int32 nIndex = str.getLength() - 1;
str.iterateCodePoints( &nIndex, 2 )

the value of postIncrementCodePoints would be invalid because it would
increment nIndex beyond the length. Instead, the function should limit
nIndex to str.getLength() upon return.

The nice thing about having it undefined behavior for now is that if there ever turns up demand to do clip excessive moves at 0 resp. length, then that can easily be implemented as a backwards compatible change.

-Stephan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to