Karsten Hilbert wrote:
> On Wed, Apr 19, 2006 at 08:13:39AM +1000, Tim Churches wrote:
> 
>> An excellent source of sex-specific given names and surname can be found
>> at http://www.census.gov/genealogy/names/names_files.html
> Thanks, very helpful !
> 
>> Aside to James: we should use these lists for further volume/concurrency
>> testing of NetEpi, rather than the lists I used previously from the
>> Australian telephone book listings - so there can be no confusion with
>> real people
> Well, I for one would append " (test)" to all lastnames used
> in such tests.

Having the same string in every name makes it hard to judge the
effectiveness of our look-up routines, which will shortly use bigram
indexing (which is simple to implement but seems to be rather effective)
as described in this paper:
http://datamining.anu.edu.au/publications/2003/kdd03-3pages.pdf

But for all other purposes, yes, I agree,, one should clearly signpost
the synthetic data as such to avoid confusion and embarrassment.

Tim C


_______________________________________________
Gnumed-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnumed-devel

Reply via email to