Hi, currently, TSX rules apply only to two-category sequences, with categories defined in the TSX file. These two-category sequences are either forbidden or enforced. This is much less than what cg-proc can do. What I would like to see is a proper lightning-fast finite state implementation of VISL CG3 (or at least, of a large subset of the kind of rules supported by VISL CG3). Is there a GSoC idea to finish this up? Who would be able to work on it?
Cheers Mikel El 07/03/16 a les 07:39, Per Tunedal ha escrit: > Hi, > Just a thought: couldn't this kind of rule just as well be implemented > in the TSX-file that's used to train the tagger? In that case, > retraining the tagger might do the trick as well. > Yours, > Per Tunedal > > On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote: >> Per Tunedal <[email protected]> čálii: >> >>> 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the >>> indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the >>> cow). We have: >>> >>> (kon→ kon<n>/ko<n>) >>> >>> Translating the whole sentence would give us: >>> >>> tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the >>> cow) >>> >>> Wouldn't that be quite revealing in many cases? In this case e.g. a >>> statistical language model could easily separate the wheat from the >>> chaff. >> That example argues against your point – here the source language has >> two analyses of "kon", with different ind/def taggings (as it should). >> >> This is not a lexical selection problem, but a morphological >> disambiguation problem. >> >> It took me all of five minutes to write a CG rule to select indefinite >> for nouns after indefinite determiners: >> >> LIST IndA = (adj ind) (adj comp) ; >> SET NotIndA = (*) - IndA ; >> REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA) >> ; >> >> and a quick corpus diff seems to show it generalises well: >> >> http://sprunge.us/hhbf?diff >> >> -- >> Kevin Brubeck Unhammer >> >> GPG: 0x766AC60C >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >> _______________________________________________ >> Apertium-stuff mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> Email had 1 attachment: >> + signature.asc >> 1k (application/pgp-signature) > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://makebettercode.com/inteldaal-eval > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://makebettercode.com/inteldaal-eval _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
