Re: [Apertium-stuff] Choosing the right word was: Re: Status of the pair sv-da

Mikel Forcada Sun, 12 May 2013 12:43:39 -0700

Per:
> Hi,
> I regard categorical ambiguity (part of speech ambiguity) as a special
> case of polysemi. What's important, is translating to a word with the
> right meaning.
I think confusing categori[c]al ambiguity with polysemy is not a good 
idea, and a source of problems. I agree they have a similar result 
(different translations) but they are treated differently, because they 
have different nature. I've been around building rule-based machine 
translation systems for almost 15 years, and this was clear to me from 
the outset.


Polysemy is a property of the lemma of a word, and it is shared by all 
its inflected forms: _station_ is as polysemic as _stations_, because 
they have the same lemma _station_.  The change in meaning has no 
syntactical effect: "I love this station" could be "I love this train 
station" or "I love this radio station"

Categorial ambiguity is a property of a particular surface form (e.g. 
"books") affects syntax in "He books a room", "books" can only be a noun.

There is a third case of ambiguity, that occurs when a surface form has 
more than one lexical form, but all have the same category. For 
instance, in Spanish, "creo" may be "I believe" or "I create": same 
category, same tense, same person and number.

In my teaching I like to call the two last ambiguities "homography".




>
> Yes, I intend to try Fran's new lexical selection module. But I was just
> thinking that the current work flow is a bit odd:
>
> 1. Wouldn't it be more adequate to begin with finding the right word,
Unless you define what you call "finding the right word", there is not 
much I can do to help.
> rather than trying to fix it afterwards with a lexical selection module?
> Yes, this is a new work flow with a disambiguator, rather than a tagger,
> choosing the right word and indirectly deciding part of speech. (Rather
> than the opposite).
Define "choosing the right word" and how you intend to do that.
>
> Or alternatively:
>
> 2. Why not collect all possible translation options and evaluate them,
> choosing the translation that seems most meaningful or fluent?
Felipe Sánchez-Martínez's PhD thesis, of which I was co-advisor, studied 
how HMM-based part-of-speech taggers could be trained using a 
target-language model. And as a point of comparison, he used a system 
that chose the best disambiguation of each sentence using a statistical 
target language model. In the languages he studied (all European) he 
found ambiguity rates of 1.3 lexical forms per surface form. This, for 
your typical 20-word sentence, means that you have to consider 1.3^20 
=190 readings. He had to do this during training. He devised a way to 
choose the winner at translation time before having to score all 
possible readings, and got quite far.

BTW, one interesting result was that his tagging accuracy was not as 
good as that of a tagger trained on hand-tagged text, but the 
translation error rate was almost as good. Tagging was just a way to get 
the best translation.
> (Something like what's done in statistical translation by weighting the
> translations by the language model.)
Yes, see above.
> Yes, this is a new "parallel" work flow with several competing
> translations, evaluated in the end.
Slow as molasses. Ask Felipe.
>
> BTW I don't like the idea of using a constraint grammar. I hope
> something more automatic could be invented.
I will comment on that in a minute.

Mikel

-- 
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Choosing the right word was: Re: Status of the pair sv-da

Reply via email to