Eike Rathke wrote:
Hi Stephan,
On Wednesday, 2007-05-09 14:10:35 +0200, Stephan Bergmann wrote:
1 Would there be legitimate use cases for rtl_uString_iterateCodePoints
to adjust an incoming index that points into the middle of a surrogate
pair, or would that only hide broken code?
I think that in the current state it would more hide broken code than
being useful. Instead, other functions like those mentioned in i76869
could be introduced, if synchronization is needed. On the other hand,
especially finding the start of a code point may be useful when
iterating backwards from the end of the string and a surrogate is the
last two code units. Maybe that's a special case?
sal_Int32 i = s.getLength();
s.iterateCodePoints(&i, -1);
will make i point to the start of the last character (if s is nonempty).
2 With the current setup where moving past the beginning or end of the
string is undefined behavior, is there any use for
postIncrementCodePoints outside [-1 .. 1]?
There may be in scenarios like "next I'll be interested in the character
after the next", so postIncrementCodePoints would be 2.
My point was that you can only safely make that call if you know that
there are at least two more code points after the current index, which
in general you can only know if you inspect the "surrogate structure" of
the OUString at the sal_Unicode level (which iterateCodePoints should
shield you from). (Whether you can safely make a call with
postIncrementCodePoints in [-1 .. 1] is easily checkable by the caller,
on the other hand.)
Or would there be legitimate
use cases for rtl_uString_iterateCodePoints to stop moving past the
beginning/end of the string when postIncrementCodePoints is too large?
I think it should stop if it is called with indexUtf16 being "outside"
the string, or resulting in such a value, so -1 and length would be the
min/max resulting values. Also,
Why -1 instead of 0?
| @param postIncrementCodePoints
| the number of code points to move the given indexUtf16; can be negative.
| The value must be such that the resulting UTF-16 based index is in the
| range from zero to the length of this string (in UTF-16 code units),
| inclusive.
leaves the impression that in
sal_Int32 nIndex = str.getLength() - 1;
str.iterateCodePoints( &nIndex, 2 )
the value of postIncrementCodePoints would be invalid because it would
increment nIndex beyond the length. Instead, the function should limit
nIndex to str.getLength() upon return.
The nice thing about having it undefined behavior for now is that if
there ever turns up demand to do clip excessive moves at 0 resp. length,
then that can easily be implemented as a backwards compatible change.
-Stephan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]