Dear Mr Kovács,                 21/2/09

At 4:26 +0000 21/02/09, FERENC KOVACS wrote:
To Lantrans to comment on :Christian Boitet <<mailto:[email protected]>[email protected]> ,s coments on my posting a video on the "garbage" produced by Language Technology practitioners.

here we go: (He calls me Kovacs Úr as if my name was translated by a MT gadget, making it absolutely strange to the general tone of forum communications, and also an insult in my home country wuith that emphaisis.)

Please accept my apologies if that could be misconstrued as an insult. It was not at all my intention.

Kovács úr means simply "Mr Kovacs" in Hungarian.

It is not an insult. It was simply a wink -- I have actually liked Hungary and Hungarians so much since I met some here and there in 1971 that I have tried to learn the language (at a nyári egyetemen then alone) and still speak and read it to a certain extent.


COMMENTS

Interpreters are known to refuse to be taped, because what they produce, once transcribed, would be judged by Kovács úr "not a serious text, but a farce or a parody".

FK: I agree


He would be right, of course. But interpreters are certainly paid more by the hour that serious translators of written texts...

Why?

Simply, the task is not the same.

And interpreters do a very good job of conveying most of the meaning of a monologue or a discussion, under real-time constraints and lexical stress.

Cognitively speaking, interpreting is much more tiring than translating.



FK. I do not agree.

Cognitively speaking interpreting requires immediate output, translating in writing gives yo you more time.

Then you agree!

Interpretors doing simultaneous interpretation usually work in pairs and interpret for 20-30mn only before switching. Translators can translate for hours without stopping (but I did not say it is not tiring).

The difference between a professional and a lousy translators (of any modality) is:

1.      Speed of performance/reaction
    2.  Number of mistakes made

Now, the situation is the same with MT of text and speech, relative to translators and interpreters.

FK: As long as you can call MT a lousy translator with the overhead of using a complicated software for various conversions and

Case 1: MT is made and used for helping translators like Kovács úr.

FK: Do not agree.

MT is done to make money for the owner of the software, usually the agency, the translator is exposed to an aggressive intermediator in the market.

For example, with what Morphologics offers now, between Hungarian and 32 other languages, notably English, Russian, etc., he could probably increase his productivity by 2 to 3 -- but only if he postedits (reading always the source segment before looking at the "pretranslation" and trying to make a good translation out of it) instead of trying to revise (reading the MT result first). I personnally recently postedited (online) results of Systran EF on rather technical texts on water and ecoloy at a rate of 500-800 words/hour, on 7000 segments.



FK: And you may have left in there some mistakes like ecoloy above. It has also has happende that 25 microliter and 50 mililiter were found to differ in numbers only. Think about the conseqeunces.

Well... if professional translators would do a perfect job, there would be no need for professional revision in the first place.

But the industrial practice is that, to translate a standard page of 250 words in a technical domain:
- a translator spends about 1 hour producing a first draft
- a revisor spends about 20 minutes polishing it
- the revisor is more qualified, and his 20 minutes are paid the same as 30 minutes of the translator.

It has happened to me several times to have to revise heavily the translations of professional (not lousy!) translators -- not only of my own texts. There was a lot to correct.

To expect the output of MT to be perfect is even more unrealistic (I would say "angelic") that to expect the productions or human translators to be perfect.


With MT software you normally do not have the context of the original layout either, which will lead to further problems. Will elaborate on this on request.

If you mean that the software does not use the information about the layout, like interpreting an image to disambiguate a text near to it, you are absolutely right.

MT is based on FORM, very rarely on real content.
There ARE a few MT systems based on content, like Catalyst at Caterpillar (derived of KBMT-89 then KANT-92 bu CMU), using an ontology of the domain of the documents.

relying on explicit understanding is simply impossible in most situations.

Working on the form, even only at the text level (as is done in "analogical translation") can however work to a certain degree, because (that has been shown in experiments on large corpora)

in NL, 96% of the analogies of form are also analogies of content.

Many say that statistical MT (rather, probabilistic MT) can not work, but it works to a certain extent, once enough corpora have been gathered. It could not work without that property of natural languages.

By contrast, if you try to build a compiler of a programming language this way, you know in advance it will not work AT ALL.

Case 2: MT is there to help people understand written or spoken utterances, and there is no translator and no interpreter there to do the job.

- obviously, there can't be one when you browse web pages, and no translator could possibly translate (even "pretranslate" a web page in 1 second, which is less than the time to read it in the first place). Again, this is another task.

- for speech, there is also a practical and financial impossibility: no TV channel coulde hire interpreters round the clock to interpret into 22 other European languages.



FK: I do not agree. It is a market. If there is no enough people around, but there is a demand for it, than let the people learn the trade and enter the market. If they find it a bitng cost, make language/translator training more effective. There are ample solutions in that direction too.

Yes but... that is not an answer.

I repeat: NO HUMAN CAN POSSIBLY ANSWER THE NEED of getting a dynamic page translated on the fly (in less than 1 second). Hence, THE TASK IS DIFFERENT, and there is no POSSIBLE competition with humans, whatever their number and qualifications.


These tasks are again different from the "help" tasks, and from the "human" tasks.

Now, looking at the footage kindly shown by Kovács úr, with no sound, the only thing I can say is that the efficiency of this system (containing no MT) is quite high, as I can understand not only the general topic of the discussions, but also most of the utterances, despite the numerous errors. Many of these errors would admittedly no be done by humans, but if a stenotypist would transcribe and her output would be fed into a program to turn it into correct running text (IBM-France did it in the 80's, it may be commercial), the stenotypist would stop working after some time and then we would have nothing.

FK: Comment: The footage has no sound, because the screen is in a company canteen (10 altogether) with the volume turned off, because the company does not want to disturb the lunchers. So the whole service is absolutely superflous and outragingly costly.

You say it. But, if this service continues to be offered, maybe it is because the lunchers like it more than a superloud sound in such a noise that nothing can be understood anyway.

Now, please explain what you mean by "outrageously expensive", and expensive for whom. Apparently, the lunchers don't pay extra money for this service, and the canteen is probably not rich enough to pay "outrageouly expensive" services.


Case 3: MT is there to help normal people (I want to say: not translators, not even real bilinguals) translate in their domains from a language they know only a little or not at all. The "operational" architecture of the MT system has to be different, because it is again another task.


FK: Comment: I like your phrasing “normal” people. I guess you are one of them (and not a translator or bilinugual)

I happen to speak very well at least French, English, German, Russian (worte articles and delivered lectures in all of them), quite well Spanish, Italian, to some degree Japanese, Hungarian, Malay, and to have studied, beside Latin and Greek, a few other modern languages (Thai, Chinese, Portugese, Vietnamese, Hindi, Arabic) to understand their system.

The situation I refer to is the situation you yourself would face if you had to translate some text from an language you don't know or know only a little, but concerning one of your hobbies. Examples are teachers wanting to adapt textbooks to their language when professional translation of the textbooks is financially impossible, a most frequent case.

What can be done is to present the user with

- the source text enhanced with annotations in his language (multiple "pidgin translation", a term introduced in 1971 by Brian Harris, professor of translatology, at that time director of the TAUM project at UdM, Montréal),

- many candidate pretranslations, factorized in such a way that 1 only appears, the "best trajectory" in the underlying controlled confusion network, and that it can be changed to another one, or directly edited when the user has understood the source segment, relying on those "linguistic crutches" and his/her good domain knowledge.

FK: I know a couple of other things that also can be done. Go and get educated in Six Sigma to find out more about 21st century quality control.

... and how would that help such people in such situations?

That is clearly another task, for different persons.



Case 4: MT is there to help speakers of different languages converse (chat or spoken dialogue). Here, there is a possibility that

- interlocutors know to some extent a common language (scenario of VerbMobil-1)

- they can use some level of interactive disambiguation


For example, Converser for HealthCare (by M.Seligman, SpokenTranslation Inc) is designed for helping health personnel in the US converse with hispanophone patients and their families about almost any topic, not just health and medicine. To raise the quality level, the system offers

. in-built controls (over the result of speech recognition, and indirectly over translations, using reverse translation)

. interactive word sense disambiguation in the source language.



Again, the task is different, as it depends on the "translational situation", and the "operational architecture" of the MT system has to be different.



FK: Comment: The speakers of different languages really need to be helped, but MT is theoretically wrongly based. It just cannot cope with context, which is also part of meaning, a concept that academic linguists and the MT practitioners elude even to define properly.

What is wrongly based is what you say about MT, both about its tasks, and about the technology behind. There are many types of MT systems...

Some MT systems can use the context of the whole document they are applied to.
All MT systems indirectly use the context of a domain and a typology (if only because they allow to combine dictionaries and define priorities -- not enough, but something in the right direction). In some systems, WSD (word sense disambiguation) is also done by methods using the whole available context (maybe not only a particular document, but a corpus). In some systems (AS-Transac by Toshiba), syntactic preferences are also recomputed after a first pass on the whole document.

Of course, MT systems do not use context and background as well as a humans.
So what? The point is that they are there to help humans, or to do a job which resembles a human job but which no human can do (again, "translate" a Web page in 1 second).



It is like transport.

Cars can't drive you home when you are unable to drive. Horses did bring their masters home after heavy drinking.

Jets don't flatter their wings, and cannot bring you home.
But they do fly and bring you near your home far quicker than if you would take a train or walk the whole way.

The various brands of MT are TOOLS to be used by HUMANS to achieve some GOALS.
It is not surprising that their TASKS are different...



Unless you understand that and stop making as if MT sytems were supposed to be robots mimicking what human translators do, you will continue to be as aggressive, but that will lead nowhere. A better idea would be to try and see how you could make the best of available technology -- for YOUR job.


Conclusions:

1) one cannot judge MT in its various forms "intrinsically", as if its task would be the same as that of professional translators like Kovács úr.

2) the particular system (speech-to-text) he refers to seems to me to merit something like a B+ (15/20) using a task-related measure.

3) the remark "You do not need 99 percent of the functionalities available. Just think about that." applies very well not only to MicroSoft software, but also to speech recognition, translation, etc. In this case (TV), we do not need the 99% of text quality a professional stenotypist followed by a program could produce after a few seconds. What we need, and get, is the 10% (2/20) of "linguistic quality", the real-time behavior, and the ergonomy, that together allow us to follow the TV show in real time.

4) returning to MT: always remember the evaluation of Systran Rus-Eng at Euratom (Ispra) in 1972: it got 18/20 (A+) from its users (nuclear scientistsà, and 2/20 (D--) by teachers of translation.

All translators, please realise that MT has never been there to replace you, but can help you a lot more than translation memories in many cases.

FK: Comment: Try to convince the translators that life is better for them by having MT around. Just listen to the posts you will be getting after I have crossposted yours to Lantra.

On this matter, may I advise to consult http://www.geocities.com/mtpostediting/. That site (by Jeff Allen) can help translators understand what is true and false about MT, and, most important, how to make the best of it.

Another thing is that translators like you might cooperate with researchers to improve MT systems with methods and tricks of the trade. Please tell us if you like the idea.

It has been put into practice since the 80's at least at PAHO (Engspan and Spanam systems by Marjorie Leon and her group), producing a very fruitful synergy, and the output of their MT systems can in some domains/typologies be considered as equivalent to first drafts by junior translators.

Best regards,

Xan

Yu too FK

Ferenc Kovacs
alias Frank
Genezistan
"Starting all over"
+44 7770654068 (Vodafone)
<http://www.firkasz.com/news.php?item.58.8>http://www.firkasz.com/news.php?item.58.8
and <http://translationjournal.net/journal/46meaning.htm>http://translationjournal.net/journal/46meaning.htm <http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373>http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373

5 St. Mary's Place
Newbury, Berkshire
RG14 1EG
U.K.


--
-------------------------------------------------------------------------
Christian Boitet
(Pr. Universite' Joseph Fourier)
      ======= Notez svp /Please note ======= GETA --> GETALP ============
Groupe d'Etude pour la Traduction Automatique
                 et le Traitement Automatisé des Langues et de la Parole
G        E             T          A              L                P

NOUVEAU FAX  / NEW FAX  (1/8/08)

GETALP, LIG-campus, BP 53 (ex: GETA, CLIPS, IMAG-campus) Tel: +33 (0)4 76 51 43 55 / 51 48 17 Fax: +33 (0)4 76 63 56 86
385, rue de la Bibliothe`que           Mel: [email protected]
38041 Grenoble Cedex 9, France
_______________________________________________
Mt-list mailing list

Reply via email to