Hi again, Obviously, CG would be quite helpful for disambiguation when doing lemmatisation. Would it be complicated to add an option to use CG (if present)? Using the cg-rules for the language would probable remove some more ambiguity.
Looking at the page http://wiki.apertium.org/wiki/Lemmatisation . What does the command actually do: $ echo "Den här är en test." | apertium -d . swe-tagger | cg-proc guesser.bin | sed 's/<[^>]\+>//g' | cg-proc -n guesser.bin Will give lemmatised output where the tokens are encased in ^ and $, and ambiguous stems/lemmas are given separated by '/' Yours, Per Tunedal On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote: > Per Tunedal <[email protected]> čálii: > > > 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the > > indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the > > cow). We have: > > > > (kon→ kon<n>/ko<n>) > > > > Translating the whole sentence would give us: > > > > tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the > > cow) > > > > Wouldn't that be quite revealing in many cases? In this case e.g. a > > statistical language model could easily separate the wheat from the > > chaff. > > That example argues against your point – here the source language has > two analyses of "kon", with different ind/def taggings (as it should). > > This is not a lexical selection problem, but a morphological > disambiguation problem. > > It took me all of five minutes to write a CG rule to select indefinite > for nouns after indefinite determiners: > > LIST IndA = (adj ind) (adj comp) ; > SET NotIndA = (*) - IndA ; > REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA) > ; > > and a quick corpus diff seems to show it generalises well: > > http://sprunge.us/hhbf?diff > > -- > Kevin Brubeck Unhammer > > GPG: 0x766AC60C > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) ------------------------------------------------------------------------------ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
