I think it is a well known bug of the hash function:

let list = ref [];;
for i = 1 to 30 do
  list := i :: !list;
  Printf.printf "%d " (Hashtbl.hash !list)
done;;

1 65601 8392706 797307010 578301955 797307010 8392706 65601 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

As you can see, long lists all have the same hash value. It is due to
the fact that the hash function does a depth-first traversal of the
datastructure, starting with the last field, instead of a breadth-first
traversal, and, as it only visits 10 nodes, they are always the same
ones for all lists of length > 10.

Thus, you should consider using your own hash function, probably only
considering the ints in the list and its length. Then, use the functor
in the Hashtbl module.

--Fabrice

On 07/27/2011 11:07 PM, Toby Kelsey wrote:
> I have written two versions of a small program, one using an (int list) data 
> structure and the other an ((int*int) list); and I tested using Hashtbl, Set 
> and (Jean-Christophe Filliatre's) Trie to cache these elements in each 
> version.
> The relative run times of the programs turns out to be:
> 
>                  Hashtbl  Set  Trie
> (int list)         1      2.1  2.0
> ((int*int) list)  34.7    2.6  2.1
> 
> 
> There is a slight inherent speed difference between the 2 versions but the 
> major effect is the cache type (in fact the caching difference must be larger 
> than these ratios due to non-cache computations). I expected Hashtbl might be 
> a bit slower than more specialised data structures, but the large speed 
> difference with different data structures was unexpected. Presumably the 
> slow-down is due to excessive hash collisions.
> 
> I had expected that the generic Hashtbl would be written to give adequate 
> speed for all types of data and when I look at OCaml code on the web Hashtbl 
> is usually used, so it seems most OCaml programmers believe the standard 
> Hashtbl is a reasonable choice for most data types as well.
> 
> I haven't tested the Batteries or OCaml-Core hash tables so these may be more 
> consistent, but if not, my question is can you predict how well Hashtbl will 
> work for different types of data and so what to use it with, or it is just 
> not reliable enough for general-purpose caching/memoization?
> 
> If anyone wants to look at the code, the (int list) Hashtbl version is at 
> <http://rosettacode.org/wiki/Sokoban#OCaml> and the other versions are in 
> <http://pastebin.com/hGn1AL9L> temporarily. Apart from the caching and the 
> int/int pair changes they should be identical.
> 
> Toby
> 

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

<<attachment: fabrice_le_fessant.vcf>>

Reply via email to