Re: [Apertium-stuff] Rules for proper names

Per Tunedal Mon, 03 Jun 2013 10:25:46 -0700

Hi,
I have refrained from entering a lot of first names and last names for
persons in the dictionaries, because it seemed to be silly work.


Now I just thought that it might be easy to translate them with some
rule, if you could just find them somehow. But I don't know if there's
any suitable way to identify names. If they could be found and
processed, they would not be marked as unknown (because they have been
translated).

Yours,
Per Tunedal

On Sat, Jun 1, 2013, at 22:51, Bernard Chardonneau wrote:
> 
> > X-Mailer: MessagingEngine.com Webmail Interface - html
> > Date: Sat, 01 Jun 2013 14:27:48 +0200
> > From: Per Tunedal <[email protected]>
> > To: Apertium Stuff <[email protected]>
> > Reply-To: [email protected]
> > Subject: Re: [Apertium-stuff] Rules for proper names
> >
> > Hi,
> > the transliteration seems straight forward.
> > I suppose just passing over a name unchanged between Swedish and Danish
> > would be easy? The national characters are written differently, though.
> > Could they be added to the character sets or would that complicate
> > things? It seams a bit odd to transliterate them (Ärlig - Ærlig, Östen -
> > Østen).
> >
> > If regexps for names would slow down Apertium, I suppose the same
> > applies to numbers. Something smarter might be useful.
> >
> > What exactly are the pardefs in the pair is-sv supposed to do? I'm
> > always lost when working with regexps :-(
> >
> >    <pardef n="persons">
> >       <!-- Ásta Árnadóttir, Ásta Á. Árnadóttir, Ásta Eva
> > Árnardóttir -->
> >       <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
> >      
> > ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
> >>      
> > ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+d</re><p><l></l><r></r></p><par
> >>       n="Ásta_Árnad/óttir__np"/></e>
> >       <!-- Davíð Oddsson, Davíð D. Oddsson, Davíð Gunnar Oddson -->
> >       <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
> >      
> > ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
> >>      
> > ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+ss</re><p><l></l><r></r></p><par
> >>       n="Almar_Þórarinss/on__np"/></e>
> >       <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
> >      
> > ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
> >>      
> > ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+s</re><p><l></l><r></r></p><par
> >>       n="Snorri_Guðjohns/en__np"/></e>
> >     </pardef>
> >
> > Yours,
> > Per Tunedal
> >
> 
> So, finally your proper nouns are changing (at least some of them)
> between the two languages.
> 
> I think your proposal is a bit complicated. You mean to ask Apertium
> to do more things every time something is translated instead of
> putting (once) some more entries in the dictionaries.
> 
> First, to distinguish proper names from unknown words starting by
> a upercase letter, you will need to enter these proper names in
> monodices.
> 
> If your language pair is translating in both directions, you will
> need Sweedish version of the proper name in Sweedish monodix and
> Danish version of the proper name in Danish monodix.
> 
> So, the question may just be to avoid putting them also into the
> bidix and doing something other to translate them.
> 
> For the .t1x file, you can generate the lemma of the source
> language using  side="sl"
> I did not test it, but it may be a way to prevent the @ symbol.
> 
> If you need to change several characters into other, I rather
> think to post-generation to do that.
> But post-generation does no seem to be documented.
> 
> 
> Anyway, you spoke about doing something straigtforward.
> 
> So why not just writing a straigthforward shell to generate
> the correct entries into the 3 dictionaries ?
> 
> You can start by a list of several thousand proper names
> written in Sweedish.
> 
> A sed command (or may be also even a tr command if it works
> correctly with UTF-8 charset) can translate any name of your
> list to Danish.
> 
> Then, you chose a paradigm for both language, and you can
> use awk (or gawk), or even echo in à loop to generate entries
> in the 3 dictionaries. Just a test with fgrep before to avoid
> multiple entries of the same word.
> 
> If several names are for persons and some other for places,
> with different paradigms in the two case, just do two lists
> of proper names and process them separatly with the good
> paradigms.
> 
> 
> 
> --------------------------------
> Bernard Chardonneau (France)
> Phone : [33] 1 64 90 87 04 or [33] 9 72 36 32 90
> (from Sept to June except holidays)
> GSM phone : [33] 6 49 95 13 95 (french scholl holidays, C zone)
> 
> Multilingual websites for my free softwares :
> http://libremail.free.fr and http://libremail.tuxfamily.org
> http://cyloop.tuxfamily.org (mainly translated with Apertium)
> 
> My general website (in french only)
> http://bech.free.fr
> 
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Rules for proper names

Reply via email to