On Wed, 17 Aug 2011 11:47:59 -0400, Jonathan M Davis <[email protected]> wrote:

On Wednesday, August 17, 2011 10:04:21 Steven Schveighoffer wrote:
On Wed, 17 Aug 2011 00:15:38 -0400, Andrei Alexandrescu

<[email protected]> wrote:
> On 8/16/11 9:29 PM, bearophile wrote:
>> Walter Bright:
>>>> I think there are search trees like the Red-Black ones that
>>>> guarantee
>>>> a O(n ln n) worst case. I am wrong?
>>>
>>> Just feed it more data.
>>
>> If you feed it more data, even if all items pruce collision because
>> they all hash to the same bucket, if you use Red-Black trees to handle
>> the items in the same bucket you keep having a O(n ln n) behaviour,
>> that's usually fast enough. With Python and the new D AAs you instead
>> get a quadratic one. This quadratic behaviour gives troubles way
>> before
>> the physical RAM is exhausted.
>>
>> Bye,
>> bearophile
>
> Let's please stop this. Many of us, including yourself, noticed the
> relatively poor performance of D's previous hashtables compared to other
> languages. Switching to singly-list collision handling marked an
> improvement. Now a lot of data structure designs have a worst-case that > makes them perform worse than others. If you worry about attacks, please
> implement your own hashtable. If we switch back to the old
> implementation, you'll complain again about D's hashtables being slower
> than Python's, thus closing a years-long cycle.

Yes, but let's not forget the one valid request out of all of this -- if
trees are no longer being used, opEquals should be used insted of opCmp.
This allows more possible key types (which don't define an ordering). I
think this would be a simple druntime change.

But then we can't change the hash table type to one that needs opCmp if we
need to later. That might be acceptable, but it makes it so that we can't
transparently change the implementation again if we decide that we need to.

I think that's a choice we should embrace. AFAIK, no *builtin* hash implementations use trees for buckets in any language I'm aware of (I'm sure someone will find one though :). The precedent is to require opHash and opEquals, not opCmp. It just makes more sense for builtin hash tables to allow the most possible key types it can.

Also, currently, if opCmp doesn't exist the *COMPILER MAKES ONE UP*, which is totally unacceptable.

So if you define opEquals and not opCmp, as bearophile points out, your specifically defined opEquals is not even used, and some made-up approximation is used instead!

It's one thing to make up opEquals, that is pretty easy to get reasonably right. It's something entirely different to invent an opCmp, especially for types which have no ordering!

-Steve

Reply via email to