Hi Kevin,
Yes, this could definitely be fixed before the translation as it's
evident looking at the grammatical construction of the sentence. And of
course it's much better to fix it before translation than after.

My point was that translation adds more information, this makes it
possible to quite easily fix ambiguity that have not been sorted out
before translation. Even simple solutions like a language model might
help.

And Apertium sv-da has a lot of problems of this kind - I don't how much
training of the tagger would have helped. Anyhow, now we've got a brand
new release of Apertium swe-dan with CG. Maybe some of these problems
are solved by now. Unfortunately I've not been able to test as the two
of my boxes running Apertium are bound for the city dump. I hope to see 
Apertium swe-dan soon at Apertium.org or maybe I'll find some time to
install Apertium at some other box. The Jjava versions cannot use CG.

Back to Lemmatisation:
What's the easiest way to do a disambiguation, rather than get a list of
possible lemmas?

Yours,
Per Tunedal

On Fri, Mar 4, 2016, at 09:41, Kevin Brubeck Unhammer wrote:
> Per Tunedal <[email protected]> čálii:
> 
> > 'ta en blå kon' (=take a blue cone) to danish. 'kon' might be the
> > indefinite form of 'kon' (= cone) or the definite form of 'ko' (= the
> > cow). We have:
> >
> >  (kon→ kon<n>/ko<n>)
> >
> > Translating the whole sentence would give us:
> >
> > tag en blå kegle / tag en blå koen (= take a blue cone / take a blue the
> > cow)
> >
> > Wouldn't that be quite revealing in many cases? In this case e.g. a
> > statistical language model could easily separate the wheat from the
> > chaff.
> 
> That example argues against your point – here the source language has
> two analyses of "kon", with different ind/def taggings (as it should).
> 
> This is not a lexical selection problem, but a morphological
> disambiguation problem.
> 
> It took me all of five minutes to write a CG rule to select indefinite
> for nouns after indefinite determiners:
> 
> LIST IndA = (adj ind) (adj comp) ;
> SET NotIndA = (*) - IndA ;
> REMOVE:en-blå-kon N + Def IF (0 N + Ind) (*-1 Det + Ind CBARRIER NotIndA)
> ;
> 
> and a quick corpus diff seems to show it generalises well:
> 
> http://sprunge.us/hhbf?diff
> 
> -- 
> Kevin Brubeck Unhammer
> 
> GPG: 0x766AC60C
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to