On 8 Apr 2018, at 16:27, Richard Frith-Macdonald 
<richard.frith-macdon...@theengagehub.com> wrote:

>> I also note that a lot of the NSString method implementations are not well 
>> optimised.
> Yes ... because they are almost never used as we historically had unicode 
> string methods and latin1 string methods.  I did optimise the more 
> 'importent' (ie ones causing trouble in the test applications I tried) ones 
> though.

It probably depends a lot on the application.  A number of things I’ve written 
have used custom NSString subclasses to keep the data in some other format for 
interoperability with some other library.  Language bridges also typically come 
with custom NSString subclasses that wrap the native string representation.  In 
these cases, the NSString implementations are used all of the time.

>> In a number of places, -characterAtIndex: is called repeatedly, when 
>> -getCharacters:range: is normally significantly more efficient.
> You have to be very careful about using -getCharacters:range: to give more 
> efficiency, and also worry about extra complexity to put buffers on stack or 
> heap (or work in subsections of strings copied to a stack buffer etc).  I 
> remember quite a few cases where more complex code 'optimised' to work that 
> way turned out to be slower for common cases.

In Étoilé, we had some macros for iterating over all characters in a string 
using -getCharacters:range:.  In Objective-C++ it’s also quite easy to write a 
wrapper that you can use with C++11 range-based for loops to iterate over 
unichars in an NSString (I’ve not done this for NSString, but I have for 
NSIndexSet, which provides a similar interface).

In the newabi branch, I’ve modified NSString to use an on-stack buffer and 
repeated calls to -getCharacters:range: in a loop, rather than copying the 
entire string to hash:


I don’t have any good applications to profile, but I’d be interested to know if 
cherry picking this change to master makes things better or worse for you (or 
makes no difference - in which case it’s probably worth keeping because at 
least it will reduce lock contention on the memory allocator and memory usage).

>> The ICU UText interface provides something very similar to 
>> -getCharacters:range: as its primitive method (a callback that fills a 
>> buffer with UTF-16 characters) and has some carefully optimised routines.
> Yes, I have been thinking about implementing an ICU subclass of NSString (on 
> platforms where ICU is available) for some time.  My assumption/hope is that 
> it might be both more correct (in odd parts of unicode that people writing 
> our stuff have been unaware of) and faster than our UTF16 code.  Even if 
> performance tturned out to be poor, it would be good to have a reference 
> implementation for testing for correctness.

GSICUString ought to provide a basis for this.  I’ve cleaned up a bit of the 
code in the newabi branch and added a -rangeOfComposedCharacterSequenceAtIndex: 
to NSString that constructs an on-stack UText (with an on-stack buffer) and 
uses ICU’s break iterator to find character breaks:


Its behaviour appears to match Apple’s and most of the tests fail, but a bunch 
of the NSURLConnection tests are failing in a somewhat opaque way (these tests 
are not really intended for debugging this kind of problem, so they’re really 
just highlighting that we’re lacking some test coverage elsewhere).


Gnustep-dev mailing list

Reply via email to