Re: [Mt-list] Reaction to Kovács' e-mail : philosophical and methodological points o f view

Timo Honkela Mon, 23 Feb 2009 09:25:36 -0800

I tend to agree with Christian that interpretation is cognitively more 
demanding from the speed of performance point of view whereas 
I agree with Ferenc that contextuality is a very important and
difficult phenomenon to deal with computationally. Please see
http://www.cis.hut.fi/tho/info/HonkelaIJCNN07.shtml for specific
discussion on this both from philosophical and methodological point of
view ("Philosophical aspects of neural, probabilistic and fuzzy 
modeling of language use and translation").

Best regards,
Timo

On Sat, 21 Feb 2009, FERENC KOVACS wrote:

> To Lantrans to comment on :Christian Boitet <[email protected]> ,s 
> coments on my posting a video on the "garbage" produced by Language 
> Technology practitioners. 
here we go: (He calls me Kovacs Úr as if my name was translated by a MT gadget, 
making it absolutely strange to the general tone of forum communications, and 
also an insult in my home country wuith that emphaisis.)

COMMENTS
Interpreters are known to refuse to be taped, because what they produce, once 
transcribed, would be judged by Kovács úr "not a serious text, but a farce or a 
parody".

FK: I agree

He would be right, of course. But interpreters are certainly paid more by the 
hour that serious translators of written texts...
Why?
Simply, the task is not the same.
And interpreters do a very good job of conveying most of the meaning of a 
monologue or a discussion, under real-time constraints and lexical stress.
Cognitively speaking, interpreting is much more tiring than translating.

FK. I do not agree.
Cognitively speaking interpreting requires immediate output, translating in 
wrting gives yo you more time.

The difference between a professional and a lousy translators (of any modality) 
is:
        1. Speed of performance/reaction
        2. Number of mistakes made

Now, the situation is the same with MT of text and speech, relative to 
translators and interpreters.

FK: As long as you can call MT a lousy translator with the overhead of using a 
compliacted software for various conversions and 

Case 1: MT is made and used for helping translatorslike Kovács úr. 

FK: Do not agree.
MT is done to make money for the owner of the software, usually the agency, the 
translator is exposed to an aggressive intermediator in the market.

For example, with what Morphologics offers now, between Hungarian and 32 other 
languages, notably English, Russian, etc., he could probably increase his 
productivity by 2 to 3 -- but only if he postedits (reading always the source 
segment before looking at the "pretranslation" and trying to make a good 
translation out of it) instead of trying to revise (reading the MT result 
first). I personnally recently postedited (online) results of Systran EF on 
rather technical texts on water and ecoloy at a rate of 500-800 words/hour, on 
7000 segments.

FK: And you may have left in there some mistakes like ecoloy above. It has also 
has happende that 25 microliter and 50 mililiter were found to differ in 
numbers only. Think about the conseqeunces. 
With MT software you normally do not have the context of the original layout 
either, which will lead to further problems. Will elaborate on this on request. 

Case 2: MT is there to help people understand written or spoken utterances,and 
there is no translator and no interpreter there to do the job.
- obviously, there can't be one when you browse web pages, and no translator 
could possibly translate (even "pretranslate" a web page in 1 second, which is 
less than the time to read it in the first place). Again, this is another task.
- for speech, there is also a practical and financial impossibility: no TV 
channel coulde hire interpreters round the clock to interpret into 22 other 
European languages.

FK: I do n ot agree. It is a market. If there is no enough people around, but 
there is a demand for it, than let the people learn the trade and enter the 
market. If they find it  abitng cost, make language/translator training more 
effective. There are ample solutions in that direction too.

These tasks are again different from the "help" tasks, and from the "human" 
tasks.

Now, looking at the footage kindly shown by Kovács úr, with no sound, the only 
thing I can say is that the efficiency of this system (containing no MT) is 
quite high, as I can understand not only the general topic of the discussions, 
but also most of the utterances, despite the numerous errors. Many of these 
errors would admittedly no be done by humans, but if a stenotypist would 
transcribe and her output would be fed into a program to turn it into correct 
running text (IBM-France did it in the 80's, it may be commercial), the 
stenotypist would stop working after some time and then we would have nothing.

FK: Comment: The footage has no sound, because the screen is in a company 
canteen (10 altogether) with the volume turned off, because the company does 
not want to disturb the lunchers. So the whole service is absolutely superflous 
and outragingly costly.

Case 3: MT is there to help normal people(I want to say: not translators, not 
even real bilinguals)translate in their domains from a language they know only 
a little or not at all. The "operational" architecture of the MT system has to 
be different, because it is again another task.

FK: Comment: I like your phrasing normal people. I guess you are one of them 
(and not  a translator or bilinugual)

What can be done is to present the user with
- the source text enhanced with annotations in his language (multiple "pidgin 
translation", a term introduced in 1971 by Brian Harris, professor of 
translatology, at that time director of the TAUM project at UdM, Montréal),
- many candidate pretranslations, factorized in such a way that 1 only appears, 
the "best trajectory" in the underlying controlled confusion network, and that 
it can be changed to another one, or directly edited when the user has 
understood the source segment, relying on those "linguistic crutches" and 
his/her good domain knowledge.

FK: I know a couple of other things that also can be done. Go and get educated 
in Six Sigma to find out more about 21st century quality control.

That is clearly another task, for different persons.

Case 4: MT is there to help speakers of different languages converse (chat or 
spoken dialogue). Here, there is a possibility that
- interlocutors know to some extent a common language (scenario of VerbMobil-1)
- they can use some level of interactive disambiguation

For example, Converser for HealthCare (by M.Seligman, SpokenTranslation Inc) is 
designed for helping health personnel in the US converse with hispanophone 
patients and their families about almost any topic, not just health and 
medicine. To raise the quality level, the system offers
. in-built controls (over the result of speech recognition, and indirectly over 
translations, using reverse translation)
. interactive word sense disambiguation in the source language.

Again, the task is different, as it depends on the "translational situation", 
and the "operational architecture" of the MT system has to be different.

FK: Comment: The speakers of different languages really need to be helped, but 
MT is theoretically wrongly based. It just cannot cope with context, which is 
also part of meaning, a concept that academic lingusits and the MT practitiners 
elude even to define properly.

Conclusions:

1) one cannot judge MT in its various forms "intrinsically", as if its task 
would be the same as that of professional translators like Kovács úr.

2) the particular system (speech-to-text) he refers to seems to me to merit 
something like a B+ (15/20) using a task-related measure.

3) the remark "You do not need 99 percent of the functionalities available. 
Just think about that." applies very well not only to MicroSoft software, but 
also to speech recognition, translation, etc.  In this case (TV), we do not 
need the 99% of text quality a professional stenotypist followed by a program 
could produce after a few seconds. What we need, and get, is the 10%  (2/20) of 
"linguistic quality", the real-time behavior, and the ergonomy, that together 
allow us to follow the TV show in real time.

4) returning to MT: always remember the evaluation of Systran Rus-Eng at 
Euratom (Ispra) in 1972: it got 18/20 (A+) from its users (nuclear scientistsà, 
and 2/20 (D--) by teachers of translation.

All translators, please realise that MT has never been there to replace you, 
but can help you a lot more than translation memories in many cases.

FK: Comment: Try to convince the translators that life is better for them by 
having MT around. Just listen to the posts you will be getting after I have 
crossposted yours to Lantra.

Best regards,

Xan
--
-------------------------------------------------------------------------
Christian Boitet
(Pr. Universite' Joseph Fourier)
      ======= Notez svp /Please note ======= GETA --> GETALP ============
Groupe d'Etude pour la Traduction Automatique
                 et le Traitement Automatisé des Langues et de la Parole
G        E             T          A              L                P

NOUVEAU FAX  / NEW FAX  (1/8/08)

GETALP, LIG-campus, BP 53              (ex: GETA, CLIPS, IMAG-campus)        
Tel: +33 (0)4 76 51 43 55 / 51 48 17  Fax: +33 (0)4 76 63 56 86
385, rue de la Bibliothe`que           Mel: [email protected]    
38041 GrenobleCedex 9, France

You too FK

Ferenc Kovacs
alias Frank
Genezistan
"Starting all over"
+44 7770654068 (Vodafone)
http://www.firkasz.com/news.php?item.58.8
and 
http://translationjournal.net/journal/46meaning.htm http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373

5 St. Mary's Place
Newbury, Berkshire
RG14 1EG
U.K.

________________________________
From: Christian Boitet <[email protected]>
To: [email protected]
Cc: FERENC KOVACS <[email protected]>; [email protected]; 
[email protected]; Gábor Prószéky <[email protected]>; 
[email protected]
Sent: Saturday, 21 February, 2009 2:41:56 AM
Subject: [Mt-list] Reaction to Kovács úr e-mail : speech recognition and 
transcription to text on Skynews UK

Dear all,                20/2/09

At 18:30 +0000 20/02/09, FERENC KOVACS wrote:
To see a sample, see the actual footage below:

http://www.firkasz.com/news.php 

interesting...

Ferenc Kovacs
alias Frank
Genezistan
"Starting all over"
+44 7770654068 (Vodafone)
www.firkasz.com and 
http://translationjournal.net/journal/46meaning.htm http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373

5 St. Mary's Place
Newbury, Berkshire
RG14 1EG
U.K.
_______________________________________________
Mt-list mailing list

COMMENTS

Interpreters are known to refuse to be taped, because what they produce, once 
transcribed, would be judged by Kovács úr "not a serious text, but a farce or a 
parody".

He would be right, of course. But interpreters are certainly paid more by the 
hour that serious translators of written texts...

Why?
Simply, the task is not the same.
And interpreters do a very good job of conveying most of the meaning of a 
monologue or a discussion, under real-time constraints and lexical stress.
Cognitively speaking, interpreting is much more tiring than translating.

Now, the situation is the same with MT of text and speech, relative to 
translators and interpreters.

Case 1: MT is made and used for helping translators like Kovács úr. For 
example, with what Morphologics offers now, between Hungarian and 32 other 
languages, notably English, Russian, etc., he could probably increase his 
productivity by 2 to 3 -- but only if he postedits (reading always the source 
segment before looking at the "pretranslation" and trying to make a good 
translation out of it) instead of trying to revise (reading the MT result 
first). I personnally recently postedited (online) results of Systran EF on 
rather technical texts on water and ecoloy at a rate of 500-800 words/hour, on 
7000 segments.

Case 2: MT is there to help people understand written or spoken utterances, and 
there is no translator and no interpreter there to do the job.
- obviously, there can't be one when you browse web pages, and no translator 
could possibly translate (even "pretranslate" a web page in 1 second, which is 
less than the time to read it in the first place). Again, this is another task.
- for speech, there is also a practical and financial impossibility: no TV 
channel coulde hire interpreters round the clock to interpret into 22 other 
European languages.

These tasks are again different from the "help" tasks, and from the "human" 
tasks.

Now, looking at the footage kindly shown by Kovács úr, with no sound, the only 
thing I can say is that the efficiency of this system (containing no MT) is 
quite high, as I can understand not only the general topic of the discussions, 
but also most of the utterances, despite the numerous errors. Many of these 
errors would admittedly no be done by humans, but if a stenotypist would 
transcribe and her output would be fed into a program to turn it into correct 
running text (IBM-France did it in the 80's, it may be commercial), the 
stenotypist would stop working after some time and then we would have nothing.

Case 3: MT is there to help normal people (I want to say: not translators, not 
even real bilinguals)translate in their domains from a language they know only 
a little or not at all. The "operational" architecture of the MT system has to 
be different, because it is again another task.

What can be done is to present the user with
- the source text enhanced with annotations in his language (multiple "pidgin 
translation", a term introduced in 1971 by Brian Harris, professor of 
translatology, at that time director of the TAUM project at UdM, Montréal),
- many candidate pretranslations, factorized in such a way that 1 only appears, 
the "best trajectory" in the underlying controlled confusion network, and that 
it can be changed to another one, or directly edited when the user has 
understood the source segment, relying on those "linguistic crutches" and 
his/her good domain knowledge.

That is clearly another task, for different persons.

Case 4: MT is there to help speakers of different languages converse (chat or 
spoken dialogue). Here, there is a possibility that
- interlocutors know to some extent a common language (scenario of VerbMobil-1)
- they can use some level of interactive disambiguation

For example, Converser for HealthCare (by M.Seligman, SpokenTranslation Inc) is 
designed for helping health personnel in the US converse with hispanophone 
patients and their families about almost any topic, not just health and 
medicine. To raise the quality level, the system offers
. in-built controls (over the result of speech recognition, and indirectly over 
translations, using reverse translation)
. interactive word sense disambiguation in the source language.

Again, the task is different, as it depends on the "translational situation", 
and the "operational architecture" of the MT system has to be different.

Conclusions:

1) one cannot judge MT in its various forms "intrinsically", as if its task 
would be the same as that of professional translators like Kovács úr.

2) the particular system (speech-to-text) he refers to seems to me to merit 
something like a B+ (15/20) using a task-related measure.

3) the remark "You do not need 99 percent of the functionalities available. 
Just think about that." applies very well not only to MicroSoft software, but 
also to speech recognition, translation, etc.  In this case (TV), we do not 
need the 99% of text quality a professional stenotypist followed by a program 
could produce after a few seconds. What we need, and get, is the 10%  (2/20) of 
"linguistic quality", the real-time behavior, and the ergonomy, that together 
allow us to follow the TV show in real time.

4) returning to MT: always remember the evaluation of Systran Rus-Eng at 
Euratom (Ispra) in 1972: it got 18/20 (A+) from its users (nuclear scientistsà, 
and 2/20 (D--) by teachers of translation.

All translators, please realise that MT has never been there to replace you, 
but can help you a lot more than translation memories in many cases.

Best regards,

Xan
--

-------------------------------------------------------------------------
Christian Boitet
(Pr. Universite' Joseph Fourier)
      ======= Notez svp /Please note ======= GETA --> GETALP ============
Groupe d'Etude pour la Traduction Automatique
                 et le Traitement Automatisé des Langues et de la Parole
G        E             T          A              L                P

NOUVEAU FAX  / NEW FAX  (1/8/08)

GETALP, LIG-campus, BP 53              (ex: GETA, CLIPS, IMAG-campus)         
Tel: +33 (0)4 76 51 43 55 / 51 48 17  Fax: +33 (0)4 76 63 56 86
385, rue de la Bibliothe`que           Mel: [email protected]    
38041 Grenoble Cedex 9, France

--
Timo Honkela, Chief Research Scientist, PhD, Docent
Adaptive Informatics Research Center
Helsinki University of Technology
P.O.Box 5400, FI-02015 TKK

timo.honkela at tkk.fi,  http://www.cis.hut.fi/tho/

_______________________________________________
Mt-list mailing list

Re: [Mt-list] Reaction to Kovács' e-mail : philosophical and methodological points o f view

Reply via email to