Well, looks like that the upper 32 bits aren't considered at all for computing the hash value. This is for sure surprising - I would have expected that these bits somehow go in to the final value (e.g. that they are xor-ed with the lower half).
My guess is that this was done to ensure that an int-to-something hashtable can be marshalled from 64 to 32 bit systems. Not sure for what this is important, but it looks like a nice symmetry. In trunk the method for ensuring this seems to have changed (see function caml_hash_mix_intnat in http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072&view=markup). Gerd > "Gerd Stolpmann" <i...@gerd-stolpmann.de> writes: > >>> Gerd Stolpmann <i...@gerd-stolpmann.de> writes: >>> You are forgetting a variable in this. To create a DOS in the hashtable >>> you need to hit the same bucket again and again. And bucket = hash % >>> size. >>> >>> You focused on the hash but lets talk about the size for a moment. >>> >>> The size is rather limited and large hashtabels simply won't increate >>> size anymore and allways have lots of collisions. So even if someone >>> isn't actively trying to create DOS the performace still tanks as you >>> add more items. >> >> I cannot completely follow here. If you add more elements, the bucket >> array is resized. The ocaml Hashtbl does this when the number of >> elements >> exceeds half of the array size. The argument of Hashtbl.create is only >> the >> initial size of the bucket array. >> >> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit >> platforms it is low (4M). But nobody of sound mind would run ocaml >> programs on 32 bit for production purposes. (If you do, consider to >> switch. You can use more memory and get a little performance boost, not >> only for ocaml programs.) >> >> Gerd > > In practice I see a serious performance degradation with large > hashtables. So much so that I had to use an array of hashtables or > hashtable of hashtables and split the values across them. > > Maybe the size of the hashtable indeed isn't the problem but the hash > function. Here is something odd: > > # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 > lsl i)) done;; > 2 2 > 4 4 > 8 8 > 16 16 > 32 32 > 64 64 > 128 128 > 256 256 > 512 512 > 1024 1024 > 2048 2048 > 4096 4096 > 8192 8192 > 16384 16384 > 32768 32768 > 65536 65536 > 131072 131072 > 262144 262144 > 524288 524288 > 1048576 1048576 > 2097152 2097152 > 4194304 4194304 > 8388608 8388608 > 16777216 16777216 > 33554432 33554432 > 67108864 67108864 > 134217728 134217728 > 268435456 268435456 > 536870912 536870912 > 1073741824 0 > 2147483648 0 > 4294967296 0 > 8589934592 0 > 17179869184 0 > 34359738368 0 > 68719476736 0 > 137438953472 0 > 274877906944 0 > 549755813888 0 > 1099511627776 0 > 2199023255552 0 > 4398046511104 0 > 8796093022208 0 > 17592186044416 0 > 35184372088832 0 > 70368744177664 0 > 140737488355328 0 > 281474976710656 0 > 562949953421312 0 > 1125899906842624 0 > 2251799813685248 0 > 4503599627370496 0 > 9007199254740992 0 > 18014398509481984 0 > 36028797018963968 0 > 72057594037927936 0 > 144115188075855872 0 > 288230376151711744 0 > 576460752303423488 0 > 1152921504606846976 0 > 2305843009213693952 0 > -4611686018427387904 0 > 0 0 > - : unit = () > > Is every value over a billion hashed to 0? > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > -- Gerd Stolpmann, Darmstadt, Germany g...@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs