Simon's comment opens an interesting question: if you have two systems with exactly the same rules/statistical tables, and they both do MT using exactly the same code, but the one system's resources were built by hand and the other's by machine learning, are they then to be classified as essentially the same or as different (namely rule-based and statistical)?


It seems to me the method of construction is less important than the method of operation, so I'd vote for calling them similar.

Assuming then that the method of construction is less relevant, one can turn to the nature of the data structures. Nowadays Phrasal SMT and EBMT start looking quite similar, though they are not yet exactly the same thing for at least two reasons:
1. PSMT's "example phrases" almost always include syntactically weird things that no human phrase-writer would ever create, such as "(in the" and "1. The" .
2. PSMT's 'phrase dictionaries' are typically *much* larger than EBMT's, and include probabilities as scores. These probabilities are similar but not identical to the rating scores usually used in EBMT (ratings don't add up to 1.0, etc., can't be combined probabilistically, etc.).


But the core intuition is exactly the same: find the high-correspondence fragments, record them with some kind of goodness rating, and re-use them as much as possible.

One could perhaps say PSMT and EBMT are two ways of trying to do exactly the same thing, and the differences arise naturally from the different methodologies. How *essential* the differences are is a matter for a future study, seems to me.

E



At 16:00 +1100 2/3/05, Simon Zwarts wrote:
On Thu, 3 Feb 2005 08:11, Alberto Manuel Brandao Simoes wrote:
 Meanwhile, I found this article from Microsoft Research:

 http://research.microsoft.com/research/pubs/view.aspx?type=Publication&id=1
354

 After reading the introduction, almost all examples of what they call
 Phrasal SMT seems (to me) examples of EBMT systems.

Hello,

First of all I think the admit right from the start they are working on ...
bridged the gap between the domain-specific learning of Example-based and SMT
systems and ... (although there are referring in this quote to a previous
system they indicate that they want to solve some problems there)

Why this still should be classified as a SMT system rather than an
Example-based system is because they still employ the typical SMT noisy
channel model (see Chapter 3) were the problem splits in the well known two
parts of decoding and the language model.





-- Eduard Hovy email: [EMAIL PROTECTED] USC Information Sciences Institute tel: 310-448-8731 4676 Admiralty Way fax: 310-823-6714 Marina del Rey, CA 90292-6695 http://www.isi.edu/natural-language/nlp-at-isi.html _______________________________________________ Mt-list mailing list

Reply via email to