Nowadays Phrasal SMT and EBMT start looking quite similar, though they are not yet exactly the same thing for at least two reasons:
1. PSMT's "example phrases" almost always include syntactically weird things that no human phrase-writer would ever create, such as "(in the" and "1. The" .

that's also the case in some EBMT systems. Ralf Brown, for instance, with "clustered transfer rules ..." extracts something like "maintenant de <--> now the".

2. PSMT's 'phrase dictionaries' are typically *much* larger than EBMT's, and include probabilities as scores. These probabilities are similar but not identical to the rating scores usually used in EBMT (ratings don't add up to 1.0, etc., can't be combined probabilistically, etc.).

I would not accept size as an exclusive criterion. Also whether probabilities or weights are used is for one some people a matter of philosophy. But I don't see why probabilities should not be used in EBMT provided that you have a closed set of references.

But the core intuition is exactly the same: find the high-correspondence fragments, record them with some kind of goodness rating, and re-use them as much as possible.

One could perhaps say PSMT and EBMT are two ways of trying to do exactly the same thing, and the differences arise naturally from the different methodologies. How *essential* the differences are is a matter for a future study, seems to me.

I admit that there are many types of EBMT which are difficult to subsume
under one definition. However it seems to me that SMT follows a bottom-up
(or better: concatenation) approach in generation. While this is also
true for some EBMT systems, template-based EBMT has a top down generation
view: slots in templates are recursively filled until no variables remain
or/and all items are translated. This comes close to some sort of 'pure'
EBMT (e.g. some of Sumita's work) where items in a retrieved sentence
are substituted. Translation units (be it words or phrases) are atoms in
SMT: there is no substitution of words in a phrase. While (in much of)
EBMT exactly this is the issue: how and what can be substituted under
what conditions *in a larger context*.

I would agree with Simon when he says: "Furthermore where in an EBMT
system you try to store as much examples as possible for future use,
most "examples" [in SMT] "disappear" in statistics."

As soon as discontinuous phrases (i.e. templates) will be used in SMT,
indeed EBMT and SMT will be hard to distinguish

Michael


E



At 16:00 +1100 2/3/05, Simon Zwarts wrote:

On Thu, 3 Feb 2005 08:11, Alberto Manuel Brandao Simoes wrote:

 Meanwhile, I found this article from Microsoft Research:

http://research.microsoft.com/research/pubs/view.aspx?type=Publication&id=1

354

 After reading the introduction, almost all examples of what they call
 Phrasal SMT seems (to me) examples of EBMT systems.


Hello,

First of all I think the admit right from the start they are working on ...
bridged the gap between the domain-specific learning of Example-based and SMT
systems and ... (although there are referring in this quote to a previous
system they indicate that they want to solve some problems there)


Why this still should be classified as a SMT system rather than an
Example-based system is because they still employ the typical SMT noisy
channel model (see Chapter 3) were the problem splits in the well known two
parts of decoding and the language model.







_______________________________________________ Mt-list mailing list

Reply via email to