Re: Hash computation and TFB

Stefan Bidi Tue, 06 Aug 2013 06:44:07 -0700

If the testsuite runs correctly, you can go ahead and apply this patch. If
possible, could you add more tests to the testsuite for these particular
cases?  I don't want to inadvertently mess this up when I go into this code
in the future.


Based on Richard's answer, I think I was way off base, anyway.


On Tue, Aug 6, 2013 at 8:36 AM, Luboš Doležel <lu...@dolezel.info> wrote:

> Yes, I've just noticed that once I force using UTF-16 in CFStringHash(),
> then -hash and CFStringHash() give the same value. The question is if it
> holds for all other bridged types.
>
> Until a better/permanent solution is found, do you think the changes
> forcing UTF-16 in CFStringHash() are acceptable? I'm currently having
> problems implementing IOKit, because CFDictionary doesn't return the values
> for keys I give to it :-(
>
> Luboš
>
> On Tue, 6 Aug 2013 08:30:10 -0500, Stefan Bidi wrote:
>
>> I copied the hash algorithm straight out of -base, so they should
>> match.  I remember a few months ago Richard was playing around with
>> hash functions and this might be causing some issues, now.
>>
>> I just looked it up, the changes were made on rev 36344.
>>
>> There is another issue... -base allows UTF-8 strings, which will not
>> be hashed to the same UTF-16 value.  In my opinion, allowing UTF-8
>> string literals is not a good idea and base should revert back to
>> Latin1 as the default C string encoding.  I'm actually debating
>> adding a UTF-16 string literals configure option for corebase.  I
>> believe using UTF-16 internally is the only sane solution to non-ASCII
>> encodings.
>>
>> I've tried experimenting with other hash functions that are not
>> one-at-a-time, but unfortunately have not found anything that will
>> work on both ASCII and Unicode strings consistently.  It would be
>> really nice to be able to work with 32- or 64-bit integers directly
>> instead of 8- or 16-bit characters.  If could use UTF-16 across the
>> board, this wouldn't be a problem.
>>
>> Anyway, those are my thoughts.
>>
>> On Tue, Aug 6, 2013 at 8:14 AM, Luboš Doležel  wrote:
>>
>>  Hello,
>>>
>>> hash computation with Toll-Free Bridging is a tricky subject. Do
>>> it wrong and you'll get all sorts of trouble, especially with
>>> dictionaries, which use hashes a lot.
>>>
>>> The code in corebase currently dispatches all CFHash() calls on
>>> ObjC objects to -hash, which is bad. The following expectation
>>> breaks due to this dispatch:
>>>
>>> CFHash(@"string") == CFHash(CFSTR("string"))
>>>
>>> because NSString uses a different hashing algorithm than CFString.
>>> My suggestion is to do away with the ObjC dispatch in CFHash() and
>>> alter all the CF*Hash() functions to support ObjC types.
>>>
>>> While looking at CFStringHash(), I've also noticed that either
>>> 8-bit or 16-bit raw character data is used for hashing based on
>>>
>> what
>>
>>> is available. I believe this breaks the following case:
>>>
>>> ===
>>> CFStringRef str1 = CFSTR("str");
>>> CFStringRef str2 = CFStringCreateWithCharacters(**NULL, (UniChar*)
>>> "str", 3); // "str" in UTF-16
>>>
>>> CFHash(str1) == CFHash(str2);
>>> ===
>>>
>>> While the two strings are obviously identical, different bytes are
>>> used to generate the hash in both cases.
>>>
>>> This problem can by solved by converting the character data to
>>> Unicode first, which has a performance impact, but only once for
>>> every CFString.
>>>
>>> The situation with CFHash() calls on NSStrings is worse, since
>>> corebase has nowhere to save the calculated hash, so it must be
>>> recalculated every time. But I think it's better to be slow than to
>>> be wrong. Please review the attached patch and let me know if you
>>> have any observations.
>>>
>>> --
>>> Luboš Doležel
>>>
>>
>>
>>
>> Links:
>> ------
>> [1] mailto:lu...@dolezel.info
>>
>
> --
> Luboš Doležel
>

_______________________________________________
Gnustep-dev mailing list
Gnustep-dev@gnu.org
https://lists.gnu.org/mailman/listinfo/gnustep-dev

Re: Hash computation and TFB

Reply via email to