Thanks Fran!
>
> http://www.lrec-conf.org/proceedings/lrec2012/pdf/1075_Paper.pdf
Interesting! But see below for a critical view.
>
> 20 hours (very little time!) writing disambiguation rules gives
> substantial improvements.
I have added the reference to page 
http://wiki.apertium.org/wiki/Constraint_Grammar (External Links).

I just want to call the attention to the fact that some of the rules 
used by these authors could be written in "canonical", CG3-free Apertium 
as "forbid" rules in .tsx files.

For instance, the rule

REMOVE (DET) IF (1C (VFIN));

corresponds to forbid rules we use in .tsx files (see, e.g. 
apertium-es-ca.es.tsx) such as:

<forbid>
<!-- ... -->
<label-sequence>
<label-item label="DETM"/>
<label-item label="VLEXPFCI"/>
</label-sequence>
<!-- ... -->
</forbid>

  We have also (historically) found that investing some time on .tsx 
rules improves taggers measurably.
> Might help us get around tagging errors like:
>
> $ echo "Avui no veig el sol." | apertium -d . ca-en-tagger
> ^Avui<adv>$ ^no<adv>$ ^veure<vblex><pri><p1><sg>$ ^el<det><def><m><sg>$
> ^sol<adj><m><sg>$^.<sent>$^.<sent>$
Fran, what would be a reasonable "forbidding" rule here that repairs 
this error but does not break things somewhere else?
>
> $ echo "Why does she do that?" | apertium -d . en-ca-tagger
> ^Why<adv><itg>$ ^do<vbdo><pri><p3><sg>$ ^prpers<prn><subj><p3><f><sg>$
> ^do<vbdo><pres>$ ^that<cnjsub>$^?<sent>$^.<sent>$

I think this could easily be dealt with in "pure", "canonical" Apertium 
using a simple forbid rule in the .tsx file. The fact that booboos like 
this one pass on to the transfer file is a clear indication that the 
.tsx file in apertium-en-ca needs love, rather than justifying the need 
for introducing a non-canonical CG3 module. I have also added a quick 
section in http://wiki.apertium.org/wiki/Constraint_Grammar to that effect.

You will notice that I make a strong point of not considering CG3 part 
of canonical or mainstream Apertium (I hope you grant me the right to 
show a reluctant position here as a creator of the original Apertium!). 
I make a similar point with respect to HFST, which is clearly 
non-canonical Apertium. I believe that using CG3 and HFST has 
effectively hindered reasonable usages of apertium-tagger and perhaps 
its development, and has also moved all attention away from improving 
the .metadix format, which has divergent dialects in different language 
pairs.

Call me conservative and radical, but I would have rather seen some 
development of apertium-tagger and the metadix format, instead of having 
to spend a long hour installing third-party tools such as OpenFST or 
vislcg3 on my machine before I can compile a language pair that requires 
such a Frankenstein configuration, and which would probably would not 
need them if we had developed the core Apertium instead of patching 
around it. Currently some language pairs use two different format for 
tagger decisions and two different formats for dictionaries. This, in my 
opinion, is far from being ideal, and may be discouraging some 
Apertiumers. I am currently helping develop apertium-eng-kaz with three 
Kazakh students and the complexity shown by this module makes it harder 
than I thought to explain.

In the past, stubbornly sticking to some design tenets such as "vintage" 
70's Unix-style pipelines and text formats has, in my opinion, 
contributed to having a lean, clear, homogeneous engine. One success of 
that is the development of multi-level transfer, with all its defects. 
That's why I will stubbornly defend canonicality!

I hope you get the point.

Cheers

Mikel

>
> Fran
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


-- 
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes InformĂ tics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to