Hi Kevin, Yes, this could definitely be fixed before the translation as it's evident looking at the grammatical construction of the sentence. And of course it's much better to fix it before translation than after.
My point was that translation adds more information, this makes it possible to quite easily fix ambiguity that have not been sorted out before translation. Even simple solutions like a language model might help. And Apertium sv-da has a lot of problems of this kind - I don't how much training of the tagger would have helped. Anyhow, now we've got a brand new release of Apertium swe-dan with CG. Maybe some of these problems are solved by now. Unfortunately I've not been able to test as the two of my boxes running Apertium are bound for the city dump. I hope to see Apertium swe-dan soon at Apertium.org or maybe I'll find some time to install Apertium at some other box. The Jjava versions cannot use CG. Back to Lemmatisation: What's the easiest way to do a disambiguation, rather than get a list of possible lemmas? Yours, Per Tunedal On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote: > Per Tunedal <[email protected]> čálii: > > > 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the > > indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the > > cow). We have: > > > > (kon→ kon<n>/ko<n>) > > > > Translating the whole sentence would give us: > > > > tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the > > cow) > > > > Wouldn't that be quite revealing in many cases? In this case e.g. a > > statistical language model could easily separate the wheat from the > > chaff. > > That example argues against your point – here the source language has > two analyses of "kon", with different ind/def taggings (as it should). > > This is not a lexical selection problem, but a morphological > disambiguation problem. > > It took me all of five minutes to write a CG rule to select indefinite > for nouns after indefinite determiners: > > LIST IndA = (adj ind) (adj comp) ; > SET NotIndA = (*) - IndA ; > REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA) > ; > > and a quick corpus diff seems to show it generalises well: > > http://sprunge.us/hhbf?diff > > -- > Kevin Brubeck Unhammer > > GPG: 0x766AC60C > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > Email had 1 attachment: > + signature.asc > 1k (application/pgp-signature) ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
