Thank you for your email. I must make one significant correction: LogoVista is NOT the producer of the language pairs listed at http://www.morphologic.hu/public/tl/EN-EU_WebMT.htm. LEC (Language Engineering Company, LLC), www.lec.com, is the producer of all of the engines listed in your document.
Glenn A. Akers, Ph.D. CEO and President Language Engineering Company, LLC www.lec.com 135 Beaver Street - Waltham, Massachusetts 02452 USA Tel: +1 781 642 8900 Fax: +1 781 642 8904 Mobile: +1 617 780 9777 Blackberry: +1 617 259 8994 -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tihanyi László Sent: Monday, July 09, 2007 12:36 PM To: [email protected] Subject: [Mt-list] common API for online MT systems Recently a survey of online machine translation services which translate between English and the official languages of the EU was conducted: http://www.morphologic.hu/public/tl/EN-EU_WebMT.htm * Summary: There are English translators for 17 languages (34 language pairs). The number of official languages of EU today is 23. We have 5 missing languages: Estonian, Irish, Lithuanian, Maltese and Slovak. English-Slovak MT exists, but still has no online version. English-Lithuanian is under preparation in cooperation with PROMT. English is official language in Ireland and Malta. So very soon we will have only one country (Estonia) without a translator between English and one of its official languages. Then the coverage in the numbers of population will reach 99.73%. There are 18 suppliers: Amebis, D'Agostini, IBM, Institute of Language and Communication, Kielikone, Linguatec, LocalTranslation, LogoVista, MorphoLogic, Poleng, ProLangs, PROMT, SkyCode, Sunda, Systran, Tranexp, Translendium, and Trident. The broad coverage and the manageable number of participants gave the following idea: PROPOSAL: A common interface for MT servers should be defined. European companies, institutions and organizations with translation needs between all of these languages could use this common interface. The details of this API could be discussed in this forum. I suggest the name EAMT API to be used. Later the API could be published at EAMT web site together with the list of services, which can be reached via this API. The connection between the MT services and the licensors would remain direct; the URL of the service would be a parameter of the API. Also, subscription methods for the services could remain unchanged and managed directly between the service providers and the subscribers. This API would serve only as a guideline for web translators. Since the connection would remain direct, additions of extra features would also remain possible. The main goal of the common API is to declare an association of web translators, which together cover nearly all of the languages of Europe. The list of MT providers might also initiate or enliven the development of missing language pairs. Some details, to start the debate: The proposed API would look like the following: the caller addresses the URL of the MT service, and sends the following parameters: language pair, text to be translated, format, encoding, domain, and a code that identifies the requestor. The reply would be the translation. The identification code could be used both for time and traffic based services. We have to exclude free services to be uniform. Free trial services can be (and usually, they are) provided at the provider's own site to gain popularity and traffic. The published list of MT providers aims to reach big customers who need complete solutions to all of the languages. There would be no need even for a common test site; the list would contain only references to websites where the tests could be done. Possible customers: first of all, the European Union itself that uses EC-Systran, which currently covers 10 language pairs (from English to: NL, FR, DE, EL, IT, ES, PT and from English to: FR, DE, ES). With the use of the EAMT API, the list of available language pairs could be extended by 26. This idea was also initiated by the growing challenge of statistical MT-systems that promise full coverage of language pairs between European languages. The services in the above survey are basically rule based, but the API is naturally open for statistical systems, too. No comment on quality is provided even tough we have strong opinions on the output of the different systems. I ignored quality info since it is constantly changing both by time, domain and evaluation method. The average number of solutions for an existing language pair is 3.1, so subscribers have an opportunity to make their own choices. Possible extensions of the list: adding more pivot languages (e.g. French), adding minority languages (e.g. Catalan), adding European but not member state languages (e.g. Russian) or adding every language of the world. The idea needs support from a group of members and we also will need support from EAMT to allow the publication of the API and the list of systems that implement it. Waiting for the your reflections, Mr. Laszlo Tihanyi MorphoLogic Head of MT Department * The list is based on thirteenth edition of the Compendium (by John Hutchins) and was extended by my private research. The Compendium lists 231 language pairs that have MT systems. I studied the 44 English-EU language pairs only, of which 34 have MT systems. For this 34 language pairs Compendium lists 480 solutions. My list contains only 137, as I ignored some systems for the following reasons: there was no online version, there were duplicate or alternative versions, developer couldn't be identified, there was no English information at the site, or it was a different dialect. _______________________________________________ Mt-list mailing list _______________________________________________ Mt-list mailing list
