Am Freitag, den 30.12.2011, 15:52 -0500 schrieb Yaron Minsky:
> It's not clever in that way.  It does try to do a good job of keeping
> the memory impact of the tree low, but you maintain O(1) by having a
> low load factor, and therefore trees of constant size.  You can take a
> look at the code here:
> (Don't rely on that repo too much yet, btw.  We're probably going to
> blow it away and create a new one in the next couple of days.  But
> going forward, we plan on using bitbucket as a place to work together
> with the community on Core.)

Interesting solution, at least. When I see it right, the Avltree has
special representations for empty and 1-node trees, so with some luck
this covers 99% of the array cells. So, the memory footprint will
usually not be higher than for conventional hash tables.

So, I'd consider Core_hashtbl as a way when you need high protection
against pathological cases, but want to keep close to the performance
pattern of Hashtbl.

However, we are in an area where we need to make compromises. I guess
Core_hashtbl is (for the non-pathological cases) by a factor slower than
the more lightweight Hashtbl. This raises the question how it competes
to other solutions, e.g. a Map where the keys are first compared by
their hash before cmp is called, for instance

let (++) x f = if x<>0 then x else f()

module HString = struct
  type t = int * string
  let compare (h1,s1) (h2,s2) =
    compare h1 h2 ++ (fun () -> compare s1 s2)

module HStringMap = Map.Make(HString)

now use it as: HStringMap.add (Hashtbl.hash key, key) value map

This eliminates one of the drawbacks of the normal Map, namely that many
keys need to be compared (which can be costly).


> y
> On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp
> <> wrote:
>         Yaron Minsky wrote:
>         > For just this reason, the hashtables in Core have been
>         reimplemented to use an
>         > AVL tree in the buckets.  That way, even when you have
>         pathological collisions,
>         > you degrade gracefully to O(log n) per operation, instead of
>         O(n), where n is
>         > the number of keys in the hashtable.
>         I'm resisting the temptation to hack-it-and-see: does your
>         implementation do anything clever to maintain Hashtbl's O(1)
>         insertion time (e.g. Hashtbl.add updates a list and then the
>         first call to Hashtbl.find or Hashtbl.mem "moves" any items
>         from the list to the AVL). Or does doing that impact "general"
>         performance too much?
>         In the POST web scenario (or processing HTTP request headers),
>         for example, "degrading" Hashtbl.add from O(1) to O(log n) is
>         only acceptable if you know that you'll query all the fields
>         in the POST (which isn't necessarily true).
>         David

Caml-list mailing list.  Subscription management and archives:
Beginner's list:
Bug reports:

Reply via email to