Romain Bardou wrote: > Le 13/03/2012 14:23, Gerd Stolpmann a écrit : > > > >> The best compromise to me is to leave the default for Hashtbl, but > >> properly document this aspect in the manual (with succint explanation > >> and one relevant pointer). That way: > >> - you don't break compatibility > >> - you keep default reproducibility (which is a real feature) > >> - you teach beginners like myself on tough aspects related to the use > >> of a datastructure in some frequent use cases. > > > > Basically I like the idea of "teaching" users this way. The typical > > user will understand the impact, and act accordingly. Nevertheless, I > > would like it if it would be made as easy as possible to provide good > > seeds if required. The Random module is definitely not good enough > > (e.g. if you know when the program was started like for a cgi, and the > > cgi reveals information it should better not like the pid, the Random > > seed is made from less than 10 unpredictable bits, and on some systems > even 0 bits). > > > > The ideal would be to guide the user to the decision whether > > protection is necessary, and if the answer is yes, to give the > > instructions how to do it (and provide all means for it, of course). > > This teaching idea sounds great indeed, but on the other hand, where do > we draw the line? If we push this reasoning too far, we could remove > typing altogether and just tell the programmer to be careful. What is the > difference here? Is a potential DoS attack "less bad" than a seg fault? > > So although the idea of teaching the programmer through the documentation > makes sense, I would put it the other way around: make the safer behavior > the default, and give debugging tools with proper warnings. Here the tool > is a "set_seed" function and the warning is in its documentation: "using > the same seed everytime can lead to DoS attacks".
+1. Surely in projects where repeatability is important, the change in behaviour to randomly seeded tables would be quickly noticed (and can be quickly solved, if the appropriate "set_seed" or whatever is there) through failing unit tests and so on, surely? Repeatability seems the more niche use of a hash table, IMHO, even if it's by some of OCaml's bigger players! One could even imagine having things so that programs linked normally use a randomly seed hashtable and programs *linked* with -g use a fixed seed, for debugging (i.e. the current behaviour) - again, with suitable documentation explaining why you don't put debug builds of software on live web servers... David -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs