On 30 May 2013 18:47, Francis Tyers <[email protected]> wrote:
> El dj 30 de 05 de 2013 a les 19:42 +0200, en/na Per Tunedal va escriure:
>> The most difficult part would be to find the names. Perhaps someone has
>> any ideas?
>
> In Icelandic--English, regular expressions are used. See e.g. pardefs
> for "persons" and "lastnames" in is.dix
>
> This is not altogether recommended though, as regular expressions slow
> down your transducer. What you could do is use them on a large corpus
> and then mass-add the ones after superficial checking.

Census data is easy to find, gazetteers for NER are easy to find,
en.wiktionary has categories for names
(http://en.wiktionary.org/wiki/Category:Surnames_by_language
http://en.wiktionary.org/wiki/Category:Male_given_names_by_language
http://en.wiktionary.org/wiki/Category:Female_given_names_by_language),
as do en.wikipedia (http://en.wikipedia.org/wiki/Category:Surnames
http://en.wikipedia.org/wiki/Category:Given_names), da.wikipedia
(http://da.wikipedia.org/wiki/Kategori:Efternavne
http://da.wikipedia.org/wiki/Kategori:Fornavne), and sv.wikipedia
(http://sv.wikipedia.org/wiki/Kategori:Efternamn
http://sv.wikipedia.org/wiki/Kategori:Förnamn), and Europarl has
speaker annotation which contains the name of the speaker.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to