Re: [Apertium-stuff] Questions about lexical selection

Hèctor Alòs i Font Mon, 20 Dec 2021 22:26:41 -0800

Missatge de Daniel Swanson <awesomeevildu...@gmail.com> del dia dt., 21 de
des. 2021 a les 7:57:


> Hi Greg,
>
> The file where you want to write rules for this is
> https://github.com/apertium/apertium-pol/blob/master/apertium-pol.pol.rlx
>
> If you want something like "tacy is <det> before <n>", you could get that
> with
>
> SELECT DET IF (0 DET) (0 NOUN) (1 NOUN) ;
>

The problem with this rule is that (1 NOUN) is not necessarily a noun, but
something that can be analysed as a noun at the moment this rule is
executed. Similarly, the 0 word may be correctly analysed as something
else, like an adjective. So, a more cautious rule can be, for instance:

REMOVE NOUN IF (0 DET) (0 NOUN) (1C NOUN) ;

The problem with this alternative variant of the rule is that it matches
less often than the first one. It may not solve cases Daniel's version
solve, although it probably makes less wrong decisions. Your knowledge of
the language, and testing on corpus, should help you decide what is better,
or maybe you will choose something else in the middle. Tuning can be done
adding a few rules, previous to the general one, for often words/cases.

Hèctor


>
> Daniel
>
> On Mon, Dec 20, 2021 at 1:40 PM Grzegorz Kulik <gregorykku...@gmail.com>
> wrote:
> >
> > Hello all,
> >
> > I haven't contacted you for some time, I hope you are all well. I
> developed the pol-szl pair and although the translation is quite
> reasonable, I decided to make it better by improving the lexical selection.
> I've been reading the documentation and managed to write several rules for
> forms that need disambiguation and are the same parts of speech. However, I
> cannot find any information anywhere about what to do if there is a form
> that can mean two completely different things. Example in Polish:
> >
> > tacy (such) = taki<det><dem><mp><pl><nom>
> > tacy (of a tablet) =
> taca<n><f><sg><gen>/taca<n><f><sg><dat>/taca<n><f><sg><loc>
> >
> > The first meaning is obviously much more frequent but the translator
> chooses the second one, which is less than desirable.
> >
> > What can I do to remedy this? Can I write rules for that manually?
> Should I train the tagger? If so, what method would be the best? There's
> multiple training methods and I don't know which one to choose for my pair.
> Could you recommend me the best approach?
> >
> > Thank you in advance
> > Greg
> > _______________________________________________
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Questions about lexical selection

Reply via email to