I have written two versions of a small program, one using an (int list) data 
structure and the other an ((int*int) list); and I tested using Hashtbl, Set 
and (Jean-Christophe Filliatre's) Trie to cache these elements in each version.
The relative run times of the programs turns out to be:

                 Hashtbl  Set  Trie
(int list)         1      2.1  2.0
((int*int) list)  34.7    2.6  2.1


There is a slight inherent speed difference between the 2 versions but the 
major effect is the cache type (in fact the caching difference must be larger 
than these ratios due to non-cache computations). I expected Hashtbl might be a 
bit slower than more specialised data structures, but the large speed 
difference with different data structures was unexpected. Presumably the 
slow-down is due to excessive hash collisions.

I had expected that the generic Hashtbl would be written to give adequate speed 
for all types of data and when I look at OCaml code on the web Hashtbl is 
usually used, so it seems most OCaml programmers believe the standard Hashtbl 
is a reasonable choice for most data types as well.

I haven't tested the Batteries or OCaml-Core hash tables so these may be more 
consistent, but if not, my question is can you predict how well Hashtbl will 
work for different types of data and so what to use it with, or it is just not 
reliable enough for general-purpose caching/memoization?

If anyone wants to look at the code, the (int list) Hashtbl version is at 
<http://rosettacode.org/wiki/Sokoban#OCaml> and the other versions are in 
<http://pastebin.com/hGn1AL9L> temporarily. Apart from the caching and the 
int/int pair changes they should be identical.

Toby

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Reply via email to