On Wednesday, August 17, 2011 08:59 Steven Schveighoffer wrote: > On Wed, 17 Aug 2011 11:47:59 -0400, Jonathan M Davis <[email protected]> > > wrote: > > On Wednesday, August 17, 2011 10:04:21 Steven Schveighoffer wrote: > >> On Wed, 17 Aug 2011 00:15:38 -0400, Andrei Alexandrescu > >> > >> <[email protected]> wrote: > >> > On 8/16/11 9:29 PM, bearophile wrote: > >> >> Walter Bright: > >> >>>> I think there are search trees like the Red-Black ones that > >> >>>> guarantee > >> >>>> a O(n ln n) worst case. I am wrong? > >> >>> > >> >>> Just feed it more data. > >> >> > >> >> If you feed it more data, even if all items pruce collision because > >> >> they all hash to the same bucket, if you use Red-Black trees to > >> > >> handle > >> > >> >> the items in the same bucket you keep having a O(n ln n) behaviour, > >> >> that's usually fast enough. With Python and the new D AAs you instead > >> >> get a quadratic one. This quadratic behaviour gives troubles way > >> >> before > >> >> the physical RAM is exhausted. > >> >> > >> >> Bye, > >> >> bearophile > >> > > >> > Let's please stop this. Many of us, including yourself, noticed the > >> > relatively poor performance of D's previous hashtables compared to > >> > >> other > >> > >> > languages. Switching to singly-list collision handling marked an > >> > improvement. Now a lot of data structure designs have a worst-case > >> > >> that > >> > >> > makes them perform worse than others. If you worry about attacks, > >> > >> please > >> > >> > implement your own hashtable. If we switch back to the old > >> > implementation, you'll complain again about D's hashtables being > >> > >> slower > >> > >> > than Python's, thus closing a years-long cycle. > >> > >> Yes, but let's not forget the one valid request out of all of this -- if > >> trees are no longer being used, opEquals should be used insted of opCmp. > >> This allows more possible key types (which don't define an ordering). I > >> think this would be a simple druntime change. > > > > But then we can't change the hash table type to one that needs opCmp if > > we > > need to later. That might be acceptable, but it makes it so that we can't > > transparently change the implementation again if we decide that we need > > to. > > I think that's a choice we should embrace. AFAIK, no *builtin* hash > implementations use trees for buckets in any language I'm aware of (I'm > sure someone will find one though :). The precedent is to require opHash > and opEquals, not opCmp. It just makes more sense for builtin hash tables > to allow the most possible key types it can. > > Also, currently, if opCmp doesn't exist the *COMPILER MAKES ONE UP*, which > is totally unacceptable. > > So if you define opEquals and not opCmp, as bearophile points out, your > specifically defined opEquals is not even used, and some made-up > approximation is used instead! > > It's one thing to make up opEquals, that is pretty easy to get reasonably > right. It's something entirely different to invent an opCmp, especially > for types which have no ordering!
I'm not necessarily arguing that we should keep requiring opCmp (and certainly having the compiler generate one is not good IMHO). I'm just pointing out that that would mean that we'd be putting further limitations on our ability to change the implementation later if we decide that we need to. As long as we think that the benefits outweight the costs, then removing the need for opCmp and AA's probably is what we should do. But regardless, having the compiler create an opCmp for you seems like highly broken behavior. - Jonathan M Davis
