On 30 May 2013 18:47, Francis Tyers <[email protected]> wrote: > El dj 30 de 05 de 2013 a les 19:42 +0200, en/na Per Tunedal va escriure: >> The most difficult part would be to find the names. Perhaps someone has >> any ideas? > > In Icelandic--English, regular expressions are used. See e.g. pardefs > for "persons" and "lastnames" in is.dix > > This is not altogether recommended though, as regular expressions slow > down your transducer. What you could do is use them on a large corpus > and then mass-add the ones after superficial checking.
Census data is easy to find, gazetteers for NER are easy to find, en.wiktionary has categories for names (http://en.wiktionary.org/wiki/Category:Surnames_by_language http://en.wiktionary.org/wiki/Category:Male_given_names_by_language http://en.wiktionary.org/wiki/Category:Female_given_names_by_language), as do en.wikipedia (http://en.wikipedia.org/wiki/Category:Surnames http://en.wikipedia.org/wiki/Category:Given_names), da.wikipedia (http://da.wikipedia.org/wiki/Kategori:Efternavne http://da.wikipedia.org/wiki/Kategori:Fornavne), and sv.wikipedia (http://sv.wikipedia.org/wiki/Kategori:Efternamn http://sv.wikipedia.org/wiki/Kategori:Förnamn), and Europarl has speaker annotation which contains the name of the speaker. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
