2011/1/27 Francis Tyers <[email protected]>

>
>
> I've added "Windows" as a proper noun, "do so" is difficult as I think
> it should translate as "hacerlo", but then the pronoun moves if it is
> finite, e.g. "lo hará", added "more than half of" as a determiner (as
> with "a shitload of")
>
>  1) More than half of software developers are already building
>     applications for Windows 7 and nearly 80% will do so within the
>     next year, a new survey has found.
>
> web) Más que medio de software developers ya está construyendo
>     aplicaciones para Ventanas 7 y nearly 80% hará tan dentro del año
>     próximo, una encuesta nueva ha encontrado.
>
> svn) Más de la mitad de desarrolladores de software ya están
>     construyendo aplicaciones para Windows 7 y casi 80% hará tan dentro
>     del el año que viene, una encuesta nueva ha encontrado.
>
>

As you know, proper names are a mess, so we have to store big amounts of
them to avoid being ridiculous. Unfortunately I couldn't yet catch a list of
trade marks, as Windows, but enterprises names (from the Fortune top 500
list), first names, family names and place names (the lasts from
French-speaking and Catalan-speaking countries). So Pierre Noël better not
be translated into e.g. Stone Christmas :-) (by the way, I still have lots
of problems with the French "Marie" analyzed as a verb, while fortunately
"Paris" already is not got as a common name)

You may find the lists I got at
https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-eo-fr/lexic/
https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-eo-ca/lexic/
(but I couldn't them yet fully process)

(For enterprises I got the top 500 list of Fortune, but e.g. in
http://fr.transnationale.org/ is said to be 7,000 names - I don't know if
they can be got somehow)

It should be better for all of us if the could put somewhere together
lexical resources in order to facilitate their finding and reuse in the
translators (maybe a directory on sourceforge exits for that, but I don't
know). I would prefer to have plain files with a clear explanation of their
content rather than Apertium-formatted stuff because every translator
prefers to differenciate first names from family names or not, masculine
first names from feminine ones or not, maybe someone would want to add the
5,000 most often Catalan family names (from Idescat) for his/her X-Y
translator or not, and so on. From plane text files, anyone can quickly
generate Apertium-like files. Lexical resources should be also referenced in
the Apertium wiki.

By the way, does someone of us yet used Wikipedia as a resource for huge
amounts of proper nouns (with possibly translations into other languages)?

In connection with proper noun translation, we have a problem in Apertium
when dealing with regular expressions. If I include, for instance, this
couple of regular expressions (in just one dictionary):
<e>       <re>Saint\-[A-Z][a-z]+</re><i></i><par n="Andorre__np"/></e>
<e>       <re>Sainte\-[A-Z][a-z]+</re><i></i><par n="Andorre__np"/></e>
the compilation (lt-comp) slows down from 12 s. to 34 s. in my computer. If
the expression is more complicated (in order to deal with non-English
characters) it's even worst. I simply can't use them in most of the cases.

Regards,
Hèctor
------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to