Was it raw POS tagged data or just raw data? can you share the code /
process you used?

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Jan 29, 2015 at 3:34 PM, Mark Harwood <
[email protected]> wrote:

> I've built one before from raw data but you need:
> 1) a *lot* of data
> 2) a unique ID per person
> 3) some noise/variation in the names recorded for each person
>
> The input is of this form:
>
> personID   recorded_name
> =======  =============
> 1               Rob
> 1               Robert
> 1               Bob
> 2               Dave
> 2               David
> 2               Alice
> ...
>
> The output is a weighted graph of name<->variant e.g Robert== Bob with a
> strong confidence rating.
> Using this I know not just real names but also typos e.g. that "Janes" is
> more likely to be "James" than "Jane" (a common typo due to key locations
> on keyboard).
>
>
>
>
> On Thursday, January 29, 2015 at 5:28:33 AM UTC, David Kemp wrote:
>>
>> I am looking for synonym dictionaries of person names that I can use with
>> the Elasticsearch synonym analyser.
>> e.g. dictionaries that map "Ted" to "Edward", and "Bill" to "William".
>> I am curious to know what others are using.
>> So far I have found these two possible sources:
>>
>> https://code.google.com/p/nickname-and-diminutive-names-
>> lookup/downloads/list
>> https://github.com/DallanQ/Names/wiki/Name-variant-files
>>
>> And perhaps
>> http://www.behindthename.com
>>
>> Thanks,
>> David
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6a473177-7fdd-49d9-95e3-538b51df57f1%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/6a473177-7fdd-49d9-95e3-538b51df57f1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zup6FroPitENCjBohH8Zxjtcs_H4fCvWmL1nQeD8zZL7w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to