Thanks Fran! > > http://www.lrec-conf.org/proceedings/lrec2012/pdf/1075_Paper.pdf Interesting! But see below for a critical view. > > 20 hours (very little time!) writing disambiguation rules gives > substantial improvements. I have added the reference to page http://wiki.apertium.org/wiki/Constraint_Grammar (External Links).
I just want to call the attention to the fact that some of the rules used by these authors could be written in "canonical", CG3-free Apertium as "forbid" rules in .tsx files. For instance, the rule REMOVE (DET) IF (1C (VFIN)); corresponds to forbid rules we use in .tsx files (see, e.g. apertium-es-ca.es.tsx) such as: <forbid> <!-- ... --> <label-sequence> <label-item label="DETM"/> <label-item label="VLEXPFCI"/> </label-sequence> <!-- ... --> </forbid> We have also (historically) found that investing some time on .tsx rules improves taggers measurably. > Might help us get around tagging errors like: > > $ echo "Avui no veig el sol." | apertium -d . ca-en-tagger > ^Avui<adv>$ ^no<adv>$ ^veure<vblex><pri><p1><sg>$ ^el<det><def><m><sg>$ > ^sol<adj><m><sg>$^.<sent>$^.<sent>$ Fran, what would be a reasonable "forbidding" rule here that repairs this error but does not break things somewhere else? > > $ echo "Why does she do that?" | apertium -d . en-ca-tagger > ^Why<adv><itg>$ ^do<vbdo><pri><p3><sg>$ ^prpers<prn><subj><p3><f><sg>$ > ^do<vbdo><pres>$ ^that<cnjsub>$^?<sent>$^.<sent>$ I think this could easily be dealt with in "pure", "canonical" Apertium using a simple forbid rule in the .tsx file. The fact that booboos like this one pass on to the transfer file is a clear indication that the .tsx file in apertium-en-ca needs love, rather than justifying the need for introducing a non-canonical CG3 module. I have also added a quick section in http://wiki.apertium.org/wiki/Constraint_Grammar to that effect. You will notice that I make a strong point of not considering CG3 part of canonical or mainstream Apertium (I hope you grant me the right to show a reluctant position here as a creator of the original Apertium!). I make a similar point with respect to HFST, which is clearly non-canonical Apertium. I believe that using CG3 and HFST has effectively hindered reasonable usages of apertium-tagger and perhaps its development, and has also moved all attention away from improving the .metadix format, which has divergent dialects in different language pairs. Call me conservative and radical, but I would have rather seen some development of apertium-tagger and the metadix format, instead of having to spend a long hour installing third-party tools such as OpenFST or vislcg3 on my machine before I can compile a language pair that requires such a Frankenstein configuration, and which would probably would not need them if we had developed the core Apertium instead of patching around it. Currently some language pairs use two different format for tagger decisions and two different formats for dictionaries. This, in my opinion, is far from being ideal, and may be discouraging some Apertiumers. I am currently helping develop apertium-eng-kaz with three Kazakh students and the complexity shown by this module makes it harder than I thought to explain. In the past, stubbornly sticking to some design tenets such as "vintage" 70's Unix-style pipelines and text formats has, in my opinion, contributed to having a lean, clear, homogeneous engine. One success of that is the development of multi-level transfer, with all its defects. That's why I will stubbornly defend canonicality! I hope you get the point. Cheers Mikel > > Fran > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes InformĂ tics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
