El ds 01 de 06 de 2013 a les 14:27 +0200, en/na Per Tunedal va escriure:
> Hi,
> the transliteration seems straight forward.
> I suppose just passing over a name unchanged between Swedish and Danish
> would be easy? The national characters are written differently, though.
> Could they be added to the character sets or would that complicate
> things? It seams a bit odd to transliterate them (Ärlig - Ærlig, Östen -
> Østen).
If you just want to translate as-is -- that's what happens when you have
an unknown word.
> If regexps for names would slow down Apertium, I suppose the same
> applies to numbers. Something smarter might be useful.
Numerals do not slow it down so much because there is a limited
alphabet.
> What exactly are the pardefs in the pair is-sv supposed to do? I'm
> always lost when working with regexps :-(
>
> <pardef n="persons">
> <!-- Ásta Árnadóttir, Ásta Á. Árnadóttir, Ásta Eva Árnardóttir -->
> <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
> ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
> ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+d</re><p><l></l><r></r></p><par
> n="Ásta_Árnad/óttir__np"/></e>
> <!-- Davíð Oddsson, Davíð D. Oddsson, Davíð Gunnar Oddson -->
> <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
> ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
> ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+ss</re><p><l></l><r></r></p><par
> n="Almar_Þórarinss/on__np"/></e>
> <e><re>[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+
> ([A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+|[A-ZÞÁÐÉÍÓÚÝÖÅ])?.?
> ?[A-ZÞÁÐÉÍÓÚÝÖÅ][a-záðéíóúýöå]+s</re><p><l></l><r></r></p><par
> n="Snorri_Guðjohns/en__np"/></e>
> </pardef>
If you take out the non-ASCII chars it becomes easier to understand:
<e><re>[A-Z][a-z]+([A-Z][a-z]+|[A-Z])?.?
?[A-Z][a-z]+s</re><p><l></l><r></r></p><par n="Snorri_Guðjohns/en__np"/></e>
---
[A-Z] Any capital letter
[a-z]+ Any sequence of lowercase letters
([A-Z][a-z]+|[A-Z])? Optionally either i) a capital letter
followed by a sequence of lowercase
letters or ii) a capital letter
on its own
.? Optionally a full stop
? Optionally a space
[A-Z] Any capital letter
[a-z]+ Any sequence of lowercase letters
s The letter 's'
Snorri_Guðjohns/en__np The contents of the paradigm
---
This will analyse:
$ echo Snorri Guðjohnsen | lt-proc is-en.automorf.bin
^Snorri Guðjohnsen/Snorri Guðjohnsen<np><ant><m><sg><nom>/Snorri
Guðjohnsen<np><ant><m><sg><acc>/Snorri Guðjohnsen<np><ant><m><sg><dat>$
$ echo Snorri Guðjohnsens | lt-proc is-en.automorf.bin
^Snorri Guðjohnsens/Snorri Guðjohnsen<np><ant><m><sg><gen>$
$ echo Snorri A. Guðjohnsens | lt-proc is-en.automorf.bin
^Snorri A. Guðjohnsens/Snorri A. Guðjohnsen<np><ant><m><sg><gen>$
$ echo Snorri Almar Guðjohnsens | lt-proc is-en.automorf.bin
^Snorri Almar Guðjohnsens/Snorri Almar Guðjohnsen<np><ant><m><sg><gen>$
$ echo Snorri U. Guðjohnsens | lt-proc is-en.automorf.bin
^Snorri U. Guðjohnsens/Snorri U. Guðjohnsen<np><ant><m><sg><gen>$
etc.
You can also try this yourself...
Fran
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff