Adding e.g. '<m>: RestOfParadigm ; Dir/RL' to Lexicon ANT-F (sic) and vice versa can be a solution to this. That is, allowing possibly erroneous forms to generate. This won't require fiddling with transfer.

I remember having added 'iv' tag with the 'only generate' restriction into the TV lexicon in of the turkic monodixen, although in that case ideally source of the error should've been fixed.



Am 04.02.2021 01:05 schrieb Bernard Chardonneau:
Answer below.

Date: Wed, 3 Feb 2021 18:10:42 +0300
From: Hèctor Alòs i Font <>
To: "[apertium-stuff]" <>
Subject: Re: [Apertium-stuff] Proper noun classification considered harmful
Pièce(s) jointes(s) probable(s)>
Missatge de Kevin Brubeck Unhammer <> del dia dc., 3 de
febr. 2021 a les 0:40:

> Hèctor Alòs i Font <>
> čálii:
> > I am more sceptical about the need to distinguish between toponyms and
> > hydronyms. In some languages one will have an article and the other will
> > not, but these are rare cases. On the other hand, we do not distinguish
> > between countries (or regions) and cities, which in French is quite
> > important both for generating the article and the preposition preceding
> it,
> > if you translate from Catalan or Spanish: for instance, "New-York" is the
> > city, but "le New-York" is the state, so will have "à New-York" or "au
> > New-York" for "in New-York" (or "à Paris" but "en France").  The
> generation
> > of articles may also not be the same whether "Barcelona" stands for the
> > city or the (football or whatever) team, nor is the gender often the
> same.
> > So, are we then going to create more and more subtypes ad nauseam? Better
> > not!
> >
> > In short, we can find casuistries in certain pairs that may make us think
> > that some distinctions are appropriate, but adding them in monolingual
> > dictionaries and forcing them to be maintained for all languages seems
> > doubtful to me.
> So the city-vs-region distinction is only useful for target (structural)
> generation, not source analysis/disambiguation/anaphora. I think that
> can be a good guide to when something should be in monodixen or not.

I am not sure to see you point. Let's see the example of New-York in
French. The city is "New-York" without any article but the state in "le New-York". The prepositions used in both cases are different in some cases
(which come to be often in Wikipedia texts). So, they have different
behaviour in French. In principle, it makes sense to differentiate them in the monodix... although I have preferred not to innovate too much, and, as
you suggest, I've used long def-lists in the transfer files.

In fact, if we say "la Floride", "le Texas", we rather say "l'état de New-York"
certainly to distinguish it from the town.

But anyway np genders in French are used for contries, states, rivers ...

And in the sentence "Dans le New-York de 1976, j'ai vu un skateboarder", we
speak of the town. "La Rome antique ..." is also about a town.

Tags for analysis are used to describe as much as possible a word. So, there may be a gender for the source language, but il will not be always used in the target language. For instance, Esperanto or English don't use it generally. And the accusative tag in the analysis on a Esperanto sentence do'nt need
to be reported in a lot of target languages generation.

In French, generally adjectives are after the name but few of them are before. So, for epo-fra translation direction, I chose to put <preadj> tags for these
adjectives in the bidix with 2 distinct lines for LR and RL directions.

In the tranfer rule, the <preadj> means we need to reorder words but it is not reported to the generation input, so the monodix don't need to be changed
on generation side.

I think it is the same for NP genders. If a language like French is the target language, it will be useless (because that will do nothing except a potential break of the generation) to put it only in the monodix. So, it needs to be in the bidix or somewhere in the transfer rule. And even if we can chose to allow a gender tag for generation, generation without this tag will also need to work for compatibility with other pairs. So in that case the gender indicated in the bidix will be usefull to set the gender of a determinant and eventually an adjective, but il will be more simple not to transmit it for the generation
of the NP.

But on the contrary, an information for the category of place in the source language to get a correct translation as "en France" or "à Paris" may be
usefull, even if it is used only in few language pairs.

Another MT project (Grammatical framework) tries to give as much as possible informations about any word of a language to allow more simple tranlations.

But may be it is not natural to think to describe if a NP is about a town, region, country, river, lake, trade mark, person, ... when this information
does not change anything in the syntax of this language.

So, an alternative possibility should be to add extra files in language
branch for when this language is the target language. These files (wordlists) could be used in tranfer without making more complicated bidixes. So, the same file could be written once and used in a lot of languge pairs. But if
the wordlist is long, I don't know if that would degrade transfer speed
performance compared to adding this information in any bidix fo which it is
Bernard Chardonneau (France)
Phone : [33] 9 72 36 32 90
GSM phone : [33] 7 69 46 16 31

An alternative Apertium translation website :

Multilingual websites for my free softwares : and (mainly translated with Apertium)

My general website (in french only)

Apertium-stuff mailing list

Apertium-stuff mailing list

Reply via email to