>On Wed, 6 Oct 1999 "Marcos" <[EMAIL PROTECTED]> wrote:
>> Hello all,
>> I'd like to know what's the best way to develop a machine translation
>> program for a language for which there are still no such translation
>> tools. The goal is to make it translatable to several languages.
At 08:18 08/10/99 +0900, One-Soon Her wrote:
>The best way, in terms of money, time, and effort, is of course
>to build only the translation memory data and linguistics in an
>existing MT environment that has proven to be effective to many
>other languages. L&H-AppTek has an MT Toolkit that fits the
>description. See their web site www.apptek.com for more.
At Thu, 7 Oct 1999 22:31:05 Glenn Akers <[EMAIL PROTECTED]> wrote:
>Subject: RE: [MT-List] MT development
>Dear All,
>Everyone should be aware that Prof. One-Soon Her is the Apptek
>representative in Taiwan, and therfore the advice given is given as
>a company rep. I have requested information from L&H about their
>Arabic/Korean offerings, but they have not yet provided them.
>I'd like to actually see their toolkit.
>Glenn Akers
>Language Engineering Corp.
Translation Memory (TM) tools are, in general, a commercialized product of
what we've been calling Example-Based Machine Translation (EBMT) in
academic research since Nagao starting working on it in the early 1980s.
There are some slight differences between the two that can be found in
various reference, but statements have been made about the relationship
between TM and EBMT systems in the following:
ALLEN, Jeffrey and Christopher HOGAN. 1998. Expanding lexical coverage of
parallel corpora for the Example-Based Machine Translation approach. In
Proceedings of the First International Conference on Language Resources and
Evaluation, 28-30 May 1998, Granada, Spain. Vol. 2, pp. 747-754.
It should be page 747 (or 748) where Chris and I state that EBMT is like TM
(I'm extracting this out of the original Latex file, but I don't have the
corresponding bibliography file to replace the shortcut references cited
below), with regard to our work on the DIPLOMAT project
(www.lti.cs.cmu.edu/Research/Diplomat/):
"EBMT (also known as translation by analogy \cite{nagao:84}) is an MT
technique which builds on the notion of translation memory (TM), a
technique used to improve the performance of human translators by
capitalizing on the reuseability of already translated sentences, phrases
and terms \cite{eagles:95,heyn:95}. EBMT translates by matching the text to
be translated against a parallel corpus (bi-text) consisting of examples
(usually sentences) in both the source and target languages. After
suitable examples are found, additional processing transforms non-exact
matches into correct translations. In the simplest case, where the input
exactly matches some example in the corpus, no processing is necessary. In
an inexact match, EBMT processing may either translate only the parts of
the example which match, or it may modify the translation from the corpus
in order to make it a correct translation of the input sentence. In
contrast to other MT designs ({\em e.g.} Statistical MT) which also use
parallel corpora to perform translation, EBMT does not rely on the relative
frequencies of the words in the corpus."
and then a recent statement about TM implementation being like EBMT in:
Garcia, Xavier. (1999 -in press). Beyond "fuzzy matching": The D�j� Vu
approach to reusing Language Resources. In the ELRA Newsletter, Volume 4
Number 3 July-September 1999. (page number currently being attributed)
"Third (a power feature still under development), D�j� Vu will soon be
able to independently search the whole content of the translation database
in order to assemble translations of almost entirely new sentences, making
it up of bits and pieces spread out all over the memory. In other words,
using language to produce language. This is very similar to Example-Based
Machine Translation."
It appears that fuzzy-matching technologies are resulting in a fuzzy line
between MT and TM.
And, as for the difficulty of processing Arabic and Korean in EBMT systems,
I think that the Carnegie Mellon Univ (CMU) and New Mexico State Univ
(NMSU) teams have written papers on this. It is best to contact Bob
Frederking and Sergei Nirenberg at the contact addresses indicated in
recent messages posted on this list.
Best,
Jeff
=================================================
Jeff ALLEN - Technical Manager/Directeur Technique
European Language Resources Association (ELRA) &
European Language resources - Distribution Agency (ELDA)
(Agence Europe'enne de Distribution des Ressources Linguistiques)
55, rue Brillat-Savarin
75013 Paris FRANCE
Tel: (+33) 1.43.13.33.33 - Fax: (+33) 1.43.13.33.30
mailto:[EMAIL PROTECTED]
http://www.icp.grenet.fr/ELRA/home.html
--
For MT-List info, see http://www.eamt.org/mt-list.html