Was it raw POS tagged data or just raw data? can you share the code / process you used?
-- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Lucene.NET committer and PMC member On Thu, Jan 29, 2015 at 3:34 PM, Mark Harwood < [email protected]> wrote: > I've built one before from raw data but you need: > 1) a *lot* of data > 2) a unique ID per person > 3) some noise/variation in the names recorded for each person > > The input is of this form: > > personID recorded_name > ======= ============= > 1 Rob > 1 Robert > 1 Bob > 2 Dave > 2 David > 2 Alice > ... > > The output is a weighted graph of name<->variant e.g Robert== Bob with a > strong confidence rating. > Using this I know not just real names but also typos e.g. that "Janes" is > more likely to be "James" than "Jane" (a common typo due to key locations > on keyboard). > > > > > On Thursday, January 29, 2015 at 5:28:33 AM UTC, David Kemp wrote: >> >> I am looking for synonym dictionaries of person names that I can use with >> the Elasticsearch synonym analyser. >> e.g. dictionaries that map "Ted" to "Edward", and "Bill" to "William". >> I am curious to know what others are using. >> So far I have found these two possible sources: >> >> https://code.google.com/p/nickname-and-diminutive-names- >> lookup/downloads/list >> https://github.com/DallanQ/Names/wiki/Name-variant-files >> >> And perhaps >> http://www.behindthename.com >> >> Thanks, >> David >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/6a473177-7fdd-49d9-95e3-538b51df57f1%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/6a473177-7fdd-49d9-95e3-538b51df57f1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zup6FroPitENCjBohH8Zxjtcs_H4fCvWmL1nQeD8zZL7w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
