Re: [Apertium-stuff] Choosing the right word was: Re: Status of the pair sv-da

Per Tunedal Sun, 12 May 2013 12:56:47 -0700

Hi Mikel,
yes, your perfectly right. I didn't bother to look for the right word
when replying: a fatal error in a discussion regarding the above subject
:-)


I apologize for my negligence. Maybe 'ambiguity', would have been a
better word, including all of your examples. My point is that it's
important to choose the right word!

Yours,
Per Tunedal

On Sun, May 12, 2013, at 21:43, Mikel Forcada wrote:
> Per:
> > Hi,
> > I regard categorical ambiguity (part of speech ambiguity) as a special
> > case of polysemi. What's important, is translating to a word with the
> > right meaning.
> I think confusing categori[c]al ambiguity with polysemy is not a good 
> idea, and a source of problems. I agree they have a similar result 
> (different translations) but they are treated differently, because they 
> have different nature. I've been around building rule-based machine 
> translation systems for almost 15 years, and this was clear to me from 
> the outset.
> 
> Polysemy is a property of the lemma of a word, and it is shared by all 
> its inflected forms: _station_ is as polysemic as _stations_, because 
> they have the same lemma _station_.  The change in meaning has no 
> syntactical effect: "I love this station" could be "I love this train 
> station" or "I love this radio station"
> 
> Categorial ambiguity is a property of a particular surface form (e.g. 
> "books") affects syntax in "He books a room", "books" can only be a noun.
> 
> There is a third case of ambiguity, that occurs when a surface form has 
> more than one lexical form, but all have the same category. For 
> instance, in Spanish, "creo" may be "I believe" or "I create": same 
> category, same tense, same person and number.
> 
> In my teaching I like to call the two last ambiguities "homography".
> 
> 
> 
> 
> >
> > Yes, I intend to try Fran's new lexical selection module. But I was just
> > thinking that the current work flow is a bit odd:
> >
> > 1. Wouldn't it be more adequate to begin with finding the right word,
> Unless you define what you call "finding the right word", there is not 
> much I can do to help.
> > rather than trying to fix it afterwards with a lexical selection module?
> > Yes, this is a new work flow with a disambiguator, rather than a tagger,
> > choosing the right word and indirectly deciding part of speech. (Rather
> > than the opposite).
> Define "choosing the right word" and how you intend to do that.
> >
> > Or alternatively:
> >
> > 2. Why not collect all possible translation options and evaluate them,
> > choosing the translation that seems most meaningful or fluent?
> Felipe Sánchez-Martínez's PhD thesis, of which I was co-advisor, studied 
> how HMM-based part-of-speech taggers could be trained using a 
> target-language model. And as a point of comparison, he used a system 
> that chose the best disambiguation of each sentence using a statistical 
> target language model. In the languages he studied (all European) he 
> found ambiguity rates of 1.3 lexical forms per surface form. This, for 
> your typical 20-word sentence, means that you have to consider 1.3^20 
> =190 readings. He had to do this during training. He devised a way to 
> choose the winner at translation time before having to score all 
> possible readings, and got quite far.
> 
> BTW, one interesting result was that his tagging accuracy was not as 
> good as that of a tagger trained on hand-tagged text, but the 
> translation error rate was almost as good. Tagging was just a way to get 
> the best translation.
> > (Something like what's done in statistical translation by weighting the
> > translations by the language model.)
> Yes, see above.
> > Yes, this is a new "parallel" work flow with several competing
> > translations, evaluated in the end.
> Slow as molasses. Ask Felipe.
> >
> > BTW I don't like the idea of using a constraint grammar. I hope
> > something more automatic could be invented.
> I will comment on that in a minute.
> 
> Mikel
> 
> -- 
> Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03071 Alacant, Spain
> Phone: +34 96 590 9776
> Fax: +34 96 590 9326
> 
> 
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and 
> their applications. This 200-page book is written by three acclaimed 
> leaders in the field. The early access version is available now. 
> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Choosing the right word was: Re: Status of the pair sv-da

Reply via email to