Hi, thanks. I will try this out when I'm less busy. What about the possibility to make some kind of add-on to Apertium to handle proper names? It should be far easier than the already present finite state transducer for transliteration, wouldn't it? Yours, Per Tunedal
On Fri, May 31, 2013, at 15:06, Jimmy O'Regan wrote: > On 30 May 2013 18:47, Francis Tyers <[email protected]> wrote: > > El dj 30 de 05 de 2013 a les 19:42 +0200, en/na Per Tunedal va escriure: > >> The most difficult part would be to find the names. Perhaps someone has > >> any ideas? > > > > In Icelandic--English, regular expressions are used. See e.g. pardefs > > for "persons" and "lastnames" in is.dix > > > > This is not altogether recommended though, as regular expressions slow > > down your transducer. What you could do is use them on a large corpus > > and then mass-add the ones after superficial checking. > > Census data is easy to find, gazetteers for NER are easy to find, > en.wiktionary has categories for names > (http://en.wiktionary.org/wiki/Category:Surnames_by_language > http://en.wiktionary.org/wiki/Category:Male_given_names_by_language > http://en.wiktionary.org/wiki/Category:Female_given_names_by_language), > as do en.wikipedia (http://en.wikipedia.org/wiki/Category:Surnames > http://en.wikipedia.org/wiki/Category:Given_names), da.wikipedia > (http://da.wikipedia.org/wiki/Kategori:Efternavne > http://da.wikipedia.org/wiki/Kategori:Fornavne), and sv.wikipedia > (http://sv.wikipedia.org/wiki/Kategori:Efternamn > http://sv.wikipedia.org/wiki/Kategori:Förnamn), and Europarl has > speaker annotation which contains the name of the speaker. > > -- > <Sefam> Are any of the mentors around? > <jimregan> yes, they're the ones trolling you > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
