"Gerd Stolpmann" <i...@gerd-stolpmann.de> writes:

>> Gerd Stolpmann <i...@gerd-stolpmann.de> writes:
>> You are forgetting a variable in this. To create a DOS in the hashtable
>> you need to hit the same bucket again and again. And bucket = hash % size.
>>
>> You focused on the hash but lets talk about the size for a moment.
>>
>> The size is rather limited and large hashtabels simply won't increate
>> size anymore and allways have lots of collisions. So even if someone
>> isn't actively trying to create DOS the performace still tanks as you
>> add more items.
>
> I cannot completely follow here. If you add more elements, the bucket
> array is resized. The ocaml Hashtbl does this when the number of elements
> exceeds half of the array size. The argument of Hashtbl.create is only the
> initial size of the bucket array.
>
> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
> platforms it is low (4M). But nobody of sound mind would run ocaml
> programs on 32 bit for production purposes. (If you do, consider to
> switch. You can use more memory and get a little performance boost, not
> only for ocaml programs.)
>
> Gerd

In practice I see a serious performance degradation with large
hashtables. So much so that I had to use an array of hashtables or
hashtable of hashtables and split the values across them.

Maybe the size of the hashtable indeed isn't the problem but the hash
function. Here is something odd:

# for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 lsl i)) 
done;;
2 2
4 4
8 8
16 16
32 32
64 64
128 128
256 256
512 512
1024 1024
2048 2048
4096 4096
8192 8192
16384 16384
32768 32768
65536 65536
131072 131072
262144 262144
524288 524288
1048576 1048576
2097152 2097152
4194304 4194304
8388608 8388608
16777216 16777216
33554432 33554432
67108864 67108864
134217728 134217728
268435456 268435456
536870912 536870912
1073741824 0
2147483648 0
4294967296 0
8589934592 0
17179869184 0
34359738368 0
68719476736 0
137438953472 0
274877906944 0
549755813888 0
1099511627776 0
2199023255552 0
4398046511104 0
8796093022208 0
17592186044416 0
35184372088832 0
70368744177664 0
140737488355328 0
281474976710656 0
562949953421312 0
1125899906842624 0
2251799813685248 0
4503599627370496 0
9007199254740992 0
18014398509481984 0
36028797018963968 0
72057594037927936 0
144115188075855872 0
288230376151711744 0
576460752303423488 0
1152921504606846976 0
2305843009213693952 0
-4611686018427387904 0
0 0
- : unit = ()

Is every value over a billion hashed to 0?

MfG
        Goswin

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Reply via email to