Adding e.g. '<m>: RestOfParadigm ; Dir/RL' to Lexicon ANT-F (sic) and
vice versa can be a solution to this. That is, allowing possibly
erroneous forms to generate. This won't require fiddling with transfer.
I remember having added 'iv' tag with the 'only generate' restriction
into the TV lexicon in of the turkic monodixen, although in that case
ideally source of the error should've been fixed.
Best,
Ilnar
Am 04.02.2021 01:05 schrieb Bernard Chardonneau:
Answer below.
Date: Wed, 3 Feb 2021 18:10:42 +0300
From: Hèctor Alòs i Font <hectora...@gmail.com>
To: "[apertium-stuff]" <apertium-stuff@lists.sourceforge.net>
Reply-To: apertium-stuff@lists.sourceforge.net
Subject: Re: [Apertium-stuff] Proper noun classification considered
harmful
Pièce(s) jointes(s) probable(s)>
Missatge de Kevin Brubeck Unhammer <unham...@fsfe.org> del dia dc., 3
de
febr. 2021 a les 0:40:
> Hèctor Alòs i Font <hectoralos-re5jqeeqqe8avxtiumw...@public.gmane.org>
> čálii:
>
> > I am more sceptical about the need to distinguish between toponyms and
> > hydronyms. In some languages one will have an article and the other will
> > not, but these are rare cases. On the other hand, we do not distinguish
> > between countries (or regions) and cities, which in French is quite
> > important both for generating the article and the preposition preceding
> it,
> > if you translate from Catalan or Spanish: for instance, "New-York" is the
> > city, but "le New-York" is the state, so will have "à New-York" or "au
> > New-York" for "in New-York" (or "à Paris" but "en France"). The
> generation
> > of articles may also not be the same whether "Barcelona" stands for the
> > city or the (football or whatever) team, nor is the gender often the
> same.
> > So, are we then going to create more and more subtypes ad nauseam? Better
> > not!
> >
> > In short, we can find casuistries in certain pairs that may make us think
> > that some distinctions are appropriate, but adding them in monolingual
> > dictionaries and forcing them to be maintained for all languages seems
> > doubtful to me.
>
> So the city-vs-region distinction is only useful for target (structural)
> generation, not source analysis/disambiguation/anaphora. I think that
> can be a good guide to when something should be in monodixen or not.
>
I am not sure to see you point. Let's see the example of New-York in
French. The city is "New-York" without any article but the state in
"le
New-York". The prepositions used in both cases are different in some
cases
(which come to be often in Wikipedia texts). So, they have different
behaviour in French. In principle, it makes sense to differentiate
them in
the monodix... although I have preferred not to innovate too much,
and, as
you suggest, I've used long def-lists in the transfer files.
In fact, if we say "la Floride", "le Texas", we rather say "l'état de
New-York"
certainly to distinguish it from the town.
But anyway np genders in French are used for contries, states, rivers
...
And in the sentence "Dans le New-York de 1976, j'ai vu un
skateboarder", we
speak of the town. "La Rome antique ..." is also about a town.
Tags for analysis are used to describe as much as possible a word. So,
there
may be a gender for the source language, but il will not be always used
in
the target language. For instance, Esperanto or English don't use it
generally.
And the accusative tag in the analysis on a Esperanto sentence do'nt
need
to be reported in a lot of target languages generation.
In French, generally adjectives are after the name but few of them are
before.
So, for epo-fra translation direction, I chose to put <preadj> tags for
these
adjectives in the bidix with 2 distinct lines for LR and RL directions.
In the tranfer rule, the <preadj> means we need to reorder words but it
is
not reported to the generation input, so the monodix don't need to be
changed
on generation side.
I think it is the same for NP genders. If a language like French is the
target
language, it will be useless (because that will do nothing except a
potential
break of the generation) to put it only in the monodix. So, it needs to
be in
the bidix or somewhere in the transfer rule. And even if we can chose
to allow
a gender tag for generation, generation without this tag will also need
to
work for compatibility with other pairs. So in that case the gender
indicated
in the bidix will be usefull to set the gender of a determinant and
eventually
an adjective, but il will be more simple not to transmit it for the
generation
of the NP.
But on the contrary, an information for the category of place in the
source
language to get a correct translation as "en France" or "à Paris" may
be
usefull, even if it is used only in few language pairs.
Another MT project (Grammatical framework) tries to give as much as
possible
informations about any word of a language to allow more simple
tranlations.
But may be it is not natural to think to describe if a NP is about a
town,
region, country, river, lake, trade mark, person, ... when this
information
does not change anything in the syntax of this language.
So, an alternative possibility should be to add extra files in language
branch for when this language is the target language. These files
(wordlists)
could be used in tranfer without making more complicated bidixes. So,
the
same file could be written once and used in a lot of languge pairs. But
if
the wordlist is long, I don't know if that would degrade transfer speed
performance compared to adding this information in any bidix fo which
it is
useful.
--------------------------------
Bernard Chardonneau (France)
Phone : [33] 9 72 36 32 90
GSM phone : [33] 7 69 46 16 31
An alternative Apertium translation website :
http://apertiumtrad.tuxfamily.org
Multilingual websites for my free softwares :
http://libremail.free.fr and http://libremail.tuxfamily.org
http://cyloop.tuxfamily.org (mainly translated with Apertium)
My general website (in french only)
http://bech.free.fr
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff