Adding to Jeff Allen's comments: Had SMT been the only successful paradigm, we would have had many commercially successful MT systems as large parralel corpora in several language pairs are now available!
A practical MT system is : RBMT (x%) + EBMT(y%) +KBMT(z%)+SMT(w%) => HMT (Hybrid MT) || => MEMT where the values of x, y, z & w are application dependent, which in turn are driven by market forces. RMK Sinha [EMAIL PROTECTED] IIT Kanpur, India > > Message: 2 > Date: Thu, 8 Jul 2004 22:47:06 +0200 > From: Jeff Allen <[EMAIL PROTECTED]> > To: Andy Way <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] > Subject: Re: [Mt-list] Where is MT at today? > > Hi Andy, > > Thanks for your post to the list. > > A few comments: > > > I think this is my main concern: SMT is very well (and deservedly so) > > established nowadays as the main way to do MT. Unless you're an MT > > person, you'd think that it was the _only_ way to do MT, as here. > > I would say: > > 1. Can SMT currently be used in implementations across the full spectrum of > real-user needs for the billions of dollars / euros of communications needs > today? > > 2. How many languages is SMT currently valid for? > > Well, read my article "What about SMT?" in the IJLD (available on my website > http://www.geocities.com/jeffallenpubs/) and that should shed some light on the > topic. > > > 1. Can papers on EBMT succeed in getting published (especially in > > non-expert, i.e. MT-specific, conferences) without making direct > > comparisons to SMT? > > You bet. Look at all the panel talks and user implementation case studies > concerning Translation Memory (TM) systems that have been presented at > conferences like ASLIB, LocalizationWorld, LREC, MT confs, and others over the > past 20 years. There were tons of booths at Localization World Bonn last week > (probably 30-50). About 350 participants and lots of user groups present. Many > TM players were there. Very few commercial MT companies present, but many > folks interested in MT in general. So where is the end-user tendency for MT- > type systems? > > SMT is just one type of system. EBMT is a different type. Different > methodologies and corresponding to different types of needs, at least for the > moment. My panel talk at Localization World Bonn provided an outline of the > types of MT systems and showed how the different features in various commercial > translation software products correspond to varying types of translation > approaches and needs. And I clearly stated that if you purchase a system that > doesn't match your need, then you don't have the right to complain about it. > > ALLEN, Jeff. 2004. Inbound vs. Outbound Translation. Presentation in > the "Localization for Customer Support" panel. LocalizationWorld, Bonn, > Germany, 29 June - 1 July 2004. > (not available yet online but will be soon). > > As for your paper, if you did not say in your submission that you were > comparing EBMT to SMT, then I see no reason why your submission should be > rejected for not doing so. I also review a lot of technical conference papers > as well as language technology articles for MultiLingual. If something is > incorrect or invalid to some point, I'll definitely make comments on it, and > usually back it up with references. But the "only" way anyone can clain that > any method is "the" best approach is to prove it from "market-driven" survey > work among users. And from the many surveys I have conducted and published > over the years in the field (again see my website under the Language Resources > section), SMT is not the one that the majority of end-users have been > implementing. This does not downgrade the value of SMT, but rather makes us > look at it from point of view of what it is good for, and what it is not yet > good for. > > > > "there is no discussion to how [our approach] would compare > > with more established techniques such as word-alignment using > > statistical models. Showing that [our approach] is comparable > > (or better) than the traditional way of acquiring > > phrase-alignment [SMT, references excluded here] would make > > this paper just great". > > Read my following recent article (short, 1-2 pages) that reminds us how people > are usually biased in what they say. Always take a few steps back and look at > the big picture: > > ALLEN, Jeffrey. March 2004. Thinking about machine translation: several > questions to ask yourself when you read an article about MT technologies. In > special supplement of Multilingual Computing and Technology magazine, Number > 62, March 2004. > See my website under the MT and MT postediting page (thematic category) or > under the Multilingual Computing and Technology (publication channel category). > > > > 3. Has EBMT as a paradigm been 'muscled out' by the more dominant > > SMT approach? > > Who says that SMT is dominant? > > Despite the fact that SMT might be the real cool thing to be doing research > on (and yes I have done work on it too several years ago at CMU and was part of > a thesis committee on applying SMT to a KBMT implementation), let's take a step > back with the real end-user perspective. > > Which engine types are being implemented today as products in real-world > contexts and are effectively financially meeting the billions of dollars / > euros of the translation and localization needs in the global market (see all > the survey results from IDC, Allied Business, Forrester, et al)? > > And which of these types of systems are realistically paying all of the > salaries of the hundreds of MT researchers and implementers across the world > today? > > * I really only know of 2 commercial companies working at implementing SMT. > * I know of 2 companies doing KBMT commercial systems, and a few industrial > projects implementing KBMT custom systems. > * There are tons of commercial companies doing RBMT systems > * Many of MT companies are implementing EBMT-like plug-in modules into > the RBMT systems > * There are many Translation Memory (TM) companies whose EBMT- > like tools are what thousands of human translators use on a daily basis for the > overall translation and localization industry. > * There are several TM tools that now have MT-approach features. > > So, which of these systems types are dominating the global market today? > > Then I look back at my own career over the past 10 years and analyze which > systems types have really actually put food on my family's table: > > SMT for a small part of 2 years > KBMT for 2 years > MEMT for 2 years > RBMT for several years > EBMT types for several years > > > I'm not at all trying to slam SMT, but I want to put into perspective what do > we mean by the "mainstream" and "dominant" approaches. All human translators I > know use TM systems, and even TM is not always a productive solution for them. > It takes a lot of evangelizing to convince the professional translator > community to use of RBMT and KBMT systems. See the discussion thread section > on my web site on what I have done over the years to do so with professional > translators on the LANTRA-L list. > > Yet, where is the majority of money being invested in products, and being spent > by user groups and institutions? > > SMT? > RMBT? > EBMT? > etc > > > And how pure are the different systems? > > Let's recall that Bob Frederking wrote a good post to the MT-List about a > year ago with regard to the definitions of different types of systems. I > really liked his description of these different systems and think he is right > on with the analogy he provided. > It is an explanation that deserves being reread. > > > I haven't answered all your questions, and have come back with more questions > for all of us, yet I myself would have a hard time saying that SMT is "the > dominant" approach today for real-world communication and translation needs. > > SMT has its place and is providing a lot of interesting results for academic > research, industrial research, government research, and now some > product/service offerings. It is more valid for some language directions and > less for others. It is a very valuable component when combining it with other > MT approaches. Yet calling it the "mainstream" approach honestly seems a bit > ignorant to me given all that I've shown above. My web site provides lots of > references to more info and details. > > Sorry for my long reply, but I hope it makes us all think a bit about what you > (Andy) are saying to the community. > > Many thanks again for your request for comments. I'll be offline without e-mail > for a week. > > Jeff > > > Quoting Andy Way <[EMAIL PROTECTED]>: > > I'm going to try very hard not to make this sound like a rant. Rather, I > > hope the following (probably long-winded) observations may seed an > > interesting debate as to where we are these days w.r.t. corpus-based MT, > > and MT in general. > > > > As many of you know, I submit to and review for many NLP and > > (especially) MT conferences. In my experience (and I trust this is > > relatively uncontroversial), the vast majority of MT papers that one > > sees nowadays are corpus-based. Now, even though I work mostly in the > > area of EBMT, I think it is again uncontroversial to state that most of > > the corpus-based MT papers one sees are not EBMT, but rather SMT. Herein > > lies the point I would like to make. > > > > We submitted a paper recently to a conference (I won't say which, but it > > wasn't an MT conference per se) which was turned down. The paper > > received 3 reviews. One comment received was: > > > > "the paper ... completely ignores the current mainstream empirical > > approach to machine translation: phrase-based or template-based > > statistical machine translation". > > > > This was true - it did. One useful comment was for us to compare our > > approach with an SMT approach - we're trying this out as we speak, but > > I wonder whether any SMT paper would be asked to compare its findings > > with those of an EBMT approach? Is it the case nowadays that a paper > > on (the admittedly considerably less mainstream) EBMT cannot stand on > > its own merit? > > > > Nevertheless, EBMT has been chunking sentences into phrases from the > > word go. SMT has recently caught on to this idea, and results have > > improved quite dramatically. Despite this, one other comment was: > > > > "there is no discussion to how [our approach] would compare > > with more established techniques such as word-alignment using > > statistical models. Showing that [our approach] is comparable > > (or better) than the traditional way of acquiring > > phrase-alignment [SMT, references excluded here] would make > > this paper just great". > > > > I think this is my main concern: SMT is very well (and deservedly so) > > established nowadays as the main way to do MT. Unless you're an MT > > person, you'd think that it was the _only_ way to do MT, as here. > > > > We've all received rejections before, and dodgy reviews too. I too > > reject papers, and I'm sure I've given the odd dodgy review too! I hope > > I'm making it clear that's not my main concern here. Rather, I have > > these questions: > > > > 1. Can papers on EBMT succeed in getting published (especially in > > non-expert, i.e. MT-specific, conferences) without making direct > > comparisons to SMT? > > > > 2. Can anyone envisage a situation where an SMT paper was asked to > > compare its results against an MT model? > > > > 3. Has EBMT as a paradigm been 'muscled out' by the more dominant > > SMT approach? > > > > 4. Instead of signalling the 'bright new dawn' for EBMT, will the > > volume of [Carl & Way, 2003] instead come to be seen as the > > epitaph for this approach? > > > > OK, maybe I'm being a bit OTT here, but you get the point. Anyone care > > to indulge me here? > > > > Cheers, > > Andy. > > > > _______________________________________________ MT-List mailing list [EMAIL PROTECTED] http://www.computing.dcu.ie/mailman/listinfo/mt-list
