On 26 January 2011 21:23, Francis Tyers <[email protected]> wrote: > El dc 26 de 01 de 2011 a les 21:19 +0000, en/na Jimmy O'Regan va > escriure: >> On 26 January 2011 11:59, Francis Tyers <[email protected]> wrote: >> > Hey all, >> > >> > Translating some text from Catalan to Spanish I get a tagging error: >> > >> > -- >> > >> > o qualsevol altre traductor automàtic >> > >> > $ echo "o qualsevol altre traductor automàtic" | apertium -d . >> > ca-es-anmor >> > ^o/o<cnjcoo>$ >> > ^qualsevol/qualsevol<adj><mf><sg>/qualsevol<prn><tn><mf><sg>/qualsevol<det><ind><mf><sg>$ >> > ^altre/altre<adj><ind><m><sg>/altre<det><ind><m><sg>$ >> > ^traductor/traductor<n><m><sg>$ >> > ^automàtic/automàtic<adj><m><sg>$^./.<sent>$ >> > >> > $ echo "o qualsevol altre traductor automàtic" | apertium -d . >> > ca-es-tagger >> > ^o<cnjcoo>$ ^qualsevol<prn><tn><mf><sg>$ ^altre<det><ind><m><sg>$ >> > ^traductor<n><m><sg>$ ^automàtic<adj><m><sg>$^.<sent>$ >> > >> > o cualquiera otro traductor automático >> > >> > -- >> > >> > I think here it should choose 'qualsevol' (determiner) as opposed to the >> > pronoun. But it could also be that I have an error in my Catalan. Could >> > someone who knows Catalan/Spanish well check this out ? >> > >> > A couple of rule might be >> > >> > FORBID prn.tn + adj.ind >> > FORBID prn.tn + det.ind >> > >> >> Can't work. The forbid rules are not rules, per se, they just insert a >> number approaching 0 as the probability of that bigram (which is >> P(w2|w1), while you're talking about P(w1|w2)... FWIW, in the cs-pl >> draft, I'd put something along the lines of 'the Markov assumption >> that a word can be disambiguated solely in terms of left context does >> not always hold true', but I was told that was a 'bold statement' and >> left it out). > > Eckhard says stuff like that all the time, maybe you need to move to > Denmark ? >
Ah... ok, now I see why it could sound 'bold'. No, in a bigram setting, P(w2|w1) is much more reasonable, and for languages like English trigrams based on P(w3|w1,w2) are fairly reasonable too, but for Czech (etc.) P(w2|w1,w3) is much better (there are many situations, especially with soft-stemmed adjectives, where the following word is often the only disambiguating context). Hunpos, btw, is configurable for either. > Also, what do you think of adding 'qualsevol' as a predet ? > Seems reasonable. I'm relatively sure that would not be a new ambiguity class, but it'd be worth checking. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
