Karsten Hilbert wrote: > On Wed, Apr 19, 2006 at 08:13:39AM +1000, Tim Churches wrote: > >> An excellent source of sex-specific given names and surname can be found >> at http://www.census.gov/genealogy/names/names_files.html > Thanks, very helpful ! > >> Aside to James: we should use these lists for further volume/concurrency >> testing of NetEpi, rather than the lists I used previously from the >> Australian telephone book listings - so there can be no confusion with >> real people > Well, I for one would append " (test)" to all lastnames used > in such tests.
Having the same string in every name makes it hard to judge the effectiveness of our look-up routines, which will shortly use bigram indexing (which is simple to implement but seems to be rather effective) as described in this paper: http://datamining.anu.edu.au/publications/2003/kdd03-3pages.pdf But for all other purposes, yes, I agree,, one should clearly signpost the synthetic data as such to avoid confusion and embarrassment. Tim C _______________________________________________ Gnumed-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnumed-devel
