Hola,
El dj 27 de 01 de 2011 a les 08:03 +0100, en/na Mikel Forcada va
escriure:
> Alternative: Instead of fiddling with "qualsevol"'s part of speech, 
> there is another solution. If the problem is only with
> 
> "qualsevol altre"
> "qualsevol altra"
> "qualssevol altres"
> 
> Just add them as multiwords and name them "det" as we do with "el meu", 
> "la meva", etc. How does that sound?

It sounds good. But there are some other cases:
echo "Els meus altres projectes. El meu altre projecte" | apertium ca-es
Mis otros proyectos
El mío otro proyecto

apertium -d . ca-es-anmor
^Els meus/El meu<det><pos><m><pl>/El meu<prn><tn><pos><m><pl>$
^altres/altre<adj><ind><mf><pl>/altre<det><ind><mf><pl>$
^projectes/projecte<n><m><pl>/projectar<vblex><pri><p2><sg>/projectar<vblex><prs><p2><sg>$
^El meu/El meu<det><pos><m><sg>/El meu<prn><tn><pos><m><sg>$
^altre/altre<adj><ind><m><sg>/altre<det><ind><m><sg>$
^projecte/projecte<n><m><sg>/projectar<vblex><pri><p1><sg>/projectar<vblex><prs><p3><sg>/projectar<vblex><prs><p1><sg>/projectar<vblex><imp><p3><sg>$^./.<sent>$

apertium -d . ca-es-tagger
^El meu<det><pos><m><pl>$ ^altre<adj><ind><mf><pl>$ ^projecte<n><m><pl>$
^El meu<prn><tn><pos><m><sg>$ ^altre<det><ind><m><sg>$
^projecte<n><m><sg>$^.<sent>$


In some cases the tagger chooses the combination det + adj.ind, whereas
in others it chooses prn + det.ind. The first one works in the
translation into Spanish, the second one doesn't.

Another example:
apertium ca-es
cap altre dia
ninguno otro día

It seems the problem is only when "altre" appears as second determiner.
The dictionary has already some combinations with "altre": "un altre"
and "molts altres" are multiwords, with two entries, one for pronoun and
the other for determiner.
So maybe it would be good to enter "el meu altre" (det) , "qualsevol
altre" (det and prn) and "cap altre" (det and prn)  (with all the
variations:

"qualsevol altre"
"qualsevol altra"
"qualssevol altres"

"el meu altre" - the only one with tagger error
"la meva altra"   (no tagging error)
"els meus altres"  (no tagging error)
"les meves altres"   (no tagging error)

"cap altre"
"cap altra" (no tagging error)

What do you think?



> 
> By the way, whatever it is done, it should be the same with Spanish 
> "cualquier otro", "cualesquier otros", etc.
> 
> Mikel
> 
> 
> "qOn 01/26/2011 10:46 PM, Jimmy O'Regan wrote:
> > On 26 January 2011 21:23, Francis Tyers<[email protected]>  wrote:
> >> El dc 26 de 01 de 2011 a les 21:19 +0000, en/na Jimmy O'Regan va
> >> escriure:
> >>> On 26 January 2011 11:59, Francis Tyers<[email protected]>  wrote:
> >>>> Hey all,
> >>>>
> >>>> Translating some text from Catalan to Spanish I get a tagging error:
> >>>>
> >>>> --
> >>>>
> >>>> o qualsevol altre traductor automàtic
> >>>>
> >>>> $ echo "o qualsevol altre traductor automàtic" | apertium -d .
> >>>> ca-es-anmor
> >>>> ^o/o<cnjcoo>$
> >>>> ^qualsevol/qualsevol<adj><mf><sg>/qualsevol<prn><tn><mf><sg>/qualsevol<det><ind><mf><sg>$
> >>>>  ^altre/altre<adj><ind><m><sg>/altre<det><ind><m><sg>$ 
> >>>> ^traductor/traductor<n><m><sg>$ 
> >>>> ^automàtic/automàtic<adj><m><sg>$^./.<sent>$
> >>>>
> >>>> $ echo "o qualsevol altre traductor automàtic" | apertium -d .
> >>>> ca-es-tagger
> >>>> ^o<cnjcoo>$ ^qualsevol<prn><tn><mf><sg>$ ^altre<det><ind><m><sg>$
> >>>> ^traductor<n><m><sg>$ ^automàtic<adj><m><sg>$^.<sent>$
> >>>>
> >>>> o cualquiera otro traductor automático
> >>>>
> >>>> --
> >>>>
> >>>> I think here it should choose 'qualsevol' (determiner) as opposed to the
> >>>> pronoun. But it could also be that I have an error in my Catalan. Could
> >>>> someone who knows Catalan/Spanish well check this out ?
> >>>>
> >>>> A couple of rule might be
> >>>>
> >>>>   FORBID prn.tn + adj.ind
> >>>>   FORBID prn.tn + det.ind
> >>>>
> >>> Can't work. The forbid rules are not rules, per se, they just insert a
> >>> number approaching 0 as the probability of that bigram (which is
> >>> P(w2|w1), while you're talking about P(w1|w2)... FWIW, in the cs-pl
> >>> draft, I'd put something along the lines of 'the Markov assumption
> >>> that a word can be disambiguated solely in terms of left context does
> >>> not always hold true', but I was told that was a 'bold statement' and
> >>> left it out).
> >> Eckhard says stuff like that all the time, maybe you need to move to
> >> Denmark ?
> >>
> > Ah... ok, now I see why it could sound 'bold'. No, in a bigram
> > setting, P(w2|w1) is much more reasonable, and for languages like
> > English trigrams based on P(w3|w1,w2) are fairly reasonable too, but
> > for Czech (etc.) P(w2|w1,w3) is much better (there are many
> > situations, especially with soft-stemmed adjectives, where the
> > following word is often the only disambiguating context). Hunpos, btw,
> > is configurable for either.
> >
> >> Also, what do you think of adding 'qualsevol' as a predet ?
> >>
> > Seems reasonable. I'm relatively sure that would not be a new
> > ambiguity class, but it'd be worth checking.
> >
> 
> 



------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to