Hi all,

A few comments on this ongoing discussion:

I suspect that the bias that Andy's experience reflects (as expressed in 
his posed questions) was unintended, but is a natural consequence of 
the current high-visibility of SMT in the research arena.  Particularly in 
broader (non-MT-specific) conferences, reviewers which are not experts on
MT may have a limited awareness of MT research beyond what they have seen 
and heard in recent (non-MT-specific) conferences, which nowadays will 
predominantly be SMT research work.  That doesn't mean non-SMT papers can't 
be accepted at such conferences, but it does make things, even if 
unintentionally, more difficult.  I think that this is less likely to happen 
within the MT-specific conferences because of the broader MT experience of 
the community of reviewers.

> >Unless I've badly misunderstood all the papers I have read, EBMT does not
> >build anything by hand. Existing translated texts are used as sources of
> >examples which are sought out and reused on the fly. In some reported
> >experiments, the examples were handpicked, or pruned to get rid of awkward
> >cases, but I don't think this idea is taken seriously as the way to do EBMT.
> 
> I stand corrected.  Thanks Harry.  In that case, it would be nice to 
> know how the learning methods of SMT and EBMT differ, and which type 
> gives better (more comprehensive/useful/etc.) results for how much 
> (effort/computation/data/etc.).

I agree with Ed that a in-depth comparative investigation of EBMT and
SMT could be very insightful for the researchers working on both of
these approaches, and for the MT research community at large.  The
problem, of course, is that very few researchers/sites have the 
necessary resources to conduct such an in-depth investigation on their own.
We at CMU may in fact be one of the few places that could carry out
such a serious investigation, since we have subgroups working on EBMT
and SMT, as well as my team that is working on automatic learning of 
transfer-rules from small amounts of manually-aligned elicited data.  

While we haven't yet done an in-depth comparative analysis of the
learning methods underlying our SMT, EBMT and transfer rule systems, 
we do have a couple of data points where these systems were trained and 
tested on the exact same data.  Our EBMT and SMT systems participated
in the past three TIDES MT evaluations, using large amounts of training 
data.  Also, as part of last year's DARPA/TIDES "Surprise Language Exercise"
on Hindi-to-English, we conducted an experiment where we compared the
performance of our learning-based transfer system with SMT and EBMT in
a scenario where the systems were trained on very small amounts of data.
Those interested in the results may want to take a look at the following:
http://www-2.cs.cmu.edu/~alavie/papers/TALIP-SLE-03.pdf

The focus of that experiment was on our transfer-rule learning approach,
so the results don't really shed much light on EBMT vs SMT, beyond the
fact that, as may be expected, both EBMT and SMT do not work well with
very limited amounts of training data.  Our transfer-rule learning approach
worked better than SMT in this experiment, and SMT worked better than EBMT,
but this was a consequence of both the specifics of the training data 
scenario, and the specifics of the systems involved.  Without further 
investigation, the results of our experiment certainly don't support any
broad conclusions about SMT and EBMT in general, beyond what I already 
mentioned above.

Since EBMT and now SMT systems come in many different flavors, it may in 
fact be quite difficult to establish broad conclusions about the relative 
strengths and weaknesses of EBMT and SMT as paradigms, and which are not 
just due to properties of specific instances of the two approaches.  But 
it's probably worth looking into.  If anyone knows of any existing work 
on this, I would be interested to hear about it.

- *Alon*

-----------------------------------------------------------------------------
Dr. Alon Lavie                     Tel : (+1-412) 268-5655
Language Technologies Institute    Fax : (+1-412) 268-6298
Carnegie Mellon University         E-mail: [EMAIL PROTECTED]
Pittsburgh, PA 15213  USA          Homepage: http://www.cs.cmu.edu/~alavie
-----------------------------------------------------------------------------



_______________________________________________
MT-List mailing list
[EMAIL PROTECTED]
http://www.computing.dcu.ie/mailman/listinfo/mt-list

Reply via email to