If the testsuite runs correctly, you can go ahead and apply this patch. If possible, could you add more tests to the testsuite for these particular cases? I don't want to inadvertently mess this up when I go into this code in the future.
Based on Richard's answer, I think I was way off base, anyway. On Tue, Aug 6, 2013 at 8:36 AM, Luboš Doležel <lu...@dolezel.info> wrote: > Yes, I've just noticed that once I force using UTF-16 in CFStringHash(), > then -hash and CFStringHash() give the same value. The question is if it > holds for all other bridged types. > > Until a better/permanent solution is found, do you think the changes > forcing UTF-16 in CFStringHash() are acceptable? I'm currently having > problems implementing IOKit, because CFDictionary doesn't return the values > for keys I give to it :-( > > Luboš > > On Tue, 6 Aug 2013 08:30:10 -0500, Stefan Bidi wrote: > >> I copied the hash algorithm straight out of -base, so they should >> match. I remember a few months ago Richard was playing around with >> hash functions and this might be causing some issues, now. >> >> I just looked it up, the changes were made on rev 36344. >> >> There is another issue... -base allows UTF-8 strings, which will not >> be hashed to the same UTF-16 value. In my opinion, allowing UTF-8 >> string literals is not a good idea and base should revert back to >> Latin1 as the default C string encoding. I'm actually debating >> adding a UTF-16 string literals configure option for corebase. I >> believe using UTF-16 internally is the only sane solution to non-ASCII >> encodings. >> >> I've tried experimenting with other hash functions that are not >> one-at-a-time, but unfortunately have not found anything that will >> work on both ASCII and Unicode strings consistently. It would be >> really nice to be able to work with 32- or 64-bit integers directly >> instead of 8- or 16-bit characters. If could use UTF-16 across the >> board, this wouldn't be a problem. >> >> Anyway, those are my thoughts. >> >> On Tue, Aug 6, 2013 at 8:14 AM, Luboš Doležel wrote: >> >> Hello, >>> >>> hash computation with Toll-Free Bridging is a tricky subject. Do >>> it wrong and you'll get all sorts of trouble, especially with >>> dictionaries, which use hashes a lot. >>> >>> The code in corebase currently dispatches all CFHash() calls on >>> ObjC objects to -hash, which is bad. The following expectation >>> breaks due to this dispatch: >>> >>> CFHash(@"string") == CFHash(CFSTR("string")) >>> >>> because NSString uses a different hashing algorithm than CFString. >>> My suggestion is to do away with the ObjC dispatch in CFHash() and >>> alter all the CF*Hash() functions to support ObjC types. >>> >>> While looking at CFStringHash(), I've also noticed that either >>> 8-bit or 16-bit raw character data is used for hashing based on >>> >> what >> >>> is available. I believe this breaks the following case: >>> >>> === >>> CFStringRef str1 = CFSTR("str"); >>> CFStringRef str2 = CFStringCreateWithCharacters(**NULL, (UniChar*) >>> "str", 3); // "str" in UTF-16 >>> >>> CFHash(str1) == CFHash(str2); >>> === >>> >>> While the two strings are obviously identical, different bytes are >>> used to generate the hash in both cases. >>> >>> This problem can by solved by converting the character data to >>> Unicode first, which has a performance impact, but only once for >>> every CFString. >>> >>> The situation with CFHash() calls on NSStrings is worse, since >>> corebase has nowhere to save the calculated hash, so it must be >>> recalculated every time. But I think it's better to be slow than to >>> be wrong. Please review the attached patch and let me know if you >>> have any observations. >>> >>> -- >>> Luboš Doležel >>> >> >> >> >> Links: >> ------ >> [1] mailto:lu...@dolezel.info >> > > -- > Luboš Doležel >
_______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev