At 5:25 PM -0400 10/6/99, [EMAIL PROTECTED] wrote:
>Hello all,
>
>I'd like to know what's the best way to develop a machine translation
>program for a language for which there are still no such translation
>tools. The goal is to make it translatable to several languages.
>
>It would be necessary to start from scratch, or could it be possible
>to take advantage of an already-made multilingual system, adding our
>module to it or so...
Hello Xavo,
there is no single *best* way; it depends on what type of MT system you
need, and that depends on what your task is. Do you want high-quality
output, and are working in a small domain? Or are you working with a
wide domain, documents from many authors, but can live with low-quality
output? Do you want immediate translation, because you are communicating
with others online? Or can the system do batch processing? Etc.
If you can live with low quality, then the most exciting new research is
to use the EGYPT package, soon available from a website at Johns Hopkins
University near Washington, which builds an MT system automatically, if
you only give it enough data. "Enough data" means 1 to 2 million sentence
pairs in the source and target languages. For more information, contact
Kevin Knight at ISI, the team leader of the project that built this package
(it is a recreation of an older system called CANDIDE). His contact info
is [EMAIL PROTECTED] .
If you do not have so much data, then you could consider an EBMT approach,
in which you put into memory whatever you already have translated in the
domain. Several companies offer translation memories. Bob Frederking at
CMU ([EMAIL PROTECTED]) works on a research system that integrates EBMT and
other methods.
If you have only say 1000 sentences, but you also have their parse trees,
then you can consider a machine learning approach, in which a system learns
the grammar to parse new sentences. Of course, you still need to build
something to convert the source trees into target trees and change words.
Contact Ulf Hermjakob at ISI ([EMAIL PROTECTED]) to learn more about this
approach.
If you have none of that at all, no data, then you are in trouble. Try
to download as many lexicons and as much text as you can from the web,
and contact the teams of Sergei Nirenburg at New Mexico State University
for help with using their interfaces and lexicon acquisition tools to
build lexicons, rules, and grammars by hand: [EMAIL PROTECTED] .
Good luck,
E
----------------------------------------------------------------------------
Eduard Hovy
email: [EMAIL PROTECTED] USC Information Sciences Institute
tel: 310-822-1511 ext 731 4676 Admiralty Way
fax: 310-823-6714 Marina del Rey, CA 90292-6695
project homepage: http://www.isi.edu/natural-language/nlp-at-isi.html
--
For MT-List info, see http://www.eamt.org/mt-list.html