El ds 01 de 06 de 2013 a les 14:27 +0200, en/na Per Tunedal va escriure:
> Hi,
> the transliteration seems straight forward.
> I suppose just passing over a name unchanged between Swedish and Danish
> would be easy? The national characters are written differently, though.
> Could they be added to the character sets or would that complicate
> things? It seams a bit odd to transliterate them (Ärlig - Ærlig, Östen -
> Østen).

If you just want to translate as-is -- that's what happens when you have
an unknown word.

> If regexps for names would slow down Apertium, I suppose the same
> applies to numbers. Something smarter might be useful.

Numerals do not slow it down so much because there is a limited
alphabet.

> What exactly are the pardefs in the pair is-sv supposed to do? I'm
> always lost when working with regexps :-(
> 
>    <pardef n="persons">
>       <!-- Ásta Árnadóttir, Ásta Á. Árnadóttir, Ásta Eva Árnardóttir -->
>       <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
>       ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
>       ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+d</re><p><l></l><r></r></p><par
>       n="Ásta_Árnad/óttir__np"/></e>
>       <!-- Davíð Oddsson, Davíð D. Oddsson, Davíð Gunnar Oddson -->
>       <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
>       ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
>       ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+ss</re><p><l></l><r></r></p><par
>       n="Almar_Þórarinss/on__np"/></e>
>       <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
>       ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
>       ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+s</re><p><l></l><r></r></p><par
>       n="Snorri_Guðjohns/en__np"/></e>
>     </pardef>

If you take out the non-ASCII chars it becomes easier to understand:

<e><re>[A-Z][a-z]+([A-Z][a-z]+|[A-Z])?.? 
?[A-Z][a-z]+s</re><p><l></l><r></r></p><par n="Snorri_Guðjohns/en__np"/></e>

---
[A-Z]                        Any capital letter
[a-z]+                       Any sequence of lowercase letters
([A-Z][a-z]+|[A-Z])?         Optionally either i) a capital letter
                               followed by a sequence of lowercase
                               letters or ii) a capital letter 
                               on its own
.?                           Optionally a full stop
 ?                           Optionally a space
[A-Z]                        Any capital letter
[a-z]+                       Any sequence of lowercase letters
s                            The letter 's'
Snorri_Guðjohns/en__np       The contents of the paradigm 
---

This will analyse:

$ echo Snorri Guðjohnsen | lt-proc is-en.automorf.bin 
^Snorri Guðjohnsen/Snorri Guðjohnsen<np><ant><m><sg><nom>/Snorri
Guðjohnsen<np><ant><m><sg><acc>/Snorri Guðjohnsen<np><ant><m><sg><dat>$

$ echo Snorri Guðjohnsens | lt-proc is-en.automorf.bin 
^Snorri Guðjohnsens/Snorri Guðjohnsen<np><ant><m><sg><gen>$

$ echo Snorri A. Guðjohnsens | lt-proc is-en.automorf.bin 
^Snorri A. Guðjohnsens/Snorri A. Guðjohnsen<np><ant><m><sg><gen>$

$ echo Snorri Almar Guðjohnsens | lt-proc is-en.automorf.bin 
^Snorri Almar Guðjohnsens/Snorri Almar Guðjohnsen<np><ant><m><sg><gen>$

$ echo Snorri U. Guðjohnsens | lt-proc is-en.automorf.bin 
^Snorri U. Guðjohnsens/Snorri U. Guðjohnsen<np><ant><m><sg><gen>$

etc.

You can also try this yourself...

Fran


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to