I tend to agree with Christian that interpretation is cognitively more demanding from the speed of performance point of view whereas I agree with Ferenc that contextuality is a very important and difficult phenomenon to deal with computationally. Please see http://www.cis.hut.fi/tho/info/HonkelaIJCNN07.shtml for specific discussion on this both from philosophical and methodological point of view ("Philosophical aspects of neural, probabilistic and fuzzy modeling of language use and translation").
Best regards, Timo On Sat, 21 Feb 2009, FERENC KOVACS wrote: > To Lantrans to comment on :Christian Boitet <[email protected]> ,s > coments on my posting a video on the "garbage" produced by Language > Technology practitioners. here we go: (He calls me Kovacs Úr as if my name was translated by a MT gadget, making it absolutely strange to the general tone of forum communications, and also an insult in my home country wuith that emphaisis.) COMMENTS Interpreters are known to refuse to be taped, because what they produce, once transcribed, would be judged by Kovács úr "not a serious text, but a farce or a parody". FK: I agree He would be right, of course. But interpreters are certainly paid more by the hour that serious translators of written texts... Why? Simply, the task is not the same. And interpreters do a very good job of conveying most of the meaning of a monologue or a discussion, under real-time constraints and lexical stress. Cognitively speaking, interpreting is much more tiring than translating. FK. I do not agree. Cognitively speaking interpreting requires immediate output, translating in wrting gives yo you more time. The difference between a professional and a lousy translators (of any modality) is: 1. Speed of performance/reaction 2. Number of mistakes made Now, the situation is the same with MT of text and speech, relative to translators and interpreters. FK: As long as you can call MT a lousy translator with the overhead of using a compliacted software for various conversions and Case 1: MT is made and used for helping translatorslike Kovács úr. FK: Do not agree. MT is done to make money for the owner of the software, usually the agency, the translator is exposed to an aggressive intermediator in the market. For example, with what Morphologics offers now, between Hungarian and 32 other languages, notably English, Russian, etc., he could probably increase his productivity by 2 to 3 -- but only if he postedits (reading always the source segment before looking at the "pretranslation" and trying to make a good translation out of it) instead of trying to revise (reading the MT result first). I personnally recently postedited (online) results of Systran EF on rather technical texts on water and ecoloy at a rate of 500-800 words/hour, on 7000 segments. FK: And you may have left in there some mistakes like ecoloy above. It has also has happende that 25 microliter and 50 mililiter were found to differ in numbers only. Think about the conseqeunces. With MT software you normally do not have the context of the original layout either, which will lead to further problems. Will elaborate on this on request. Case 2: MT is there to help people understand written or spoken utterances,and there is no translator and no interpreter there to do the job. - obviously, there can't be one when you browse web pages, and no translator could possibly translate (even "pretranslate" a web page in 1 second, which is less than the time to read it in the first place). Again, this is another task. - for speech, there is also a practical and financial impossibility: no TV channel coulde hire interpreters round the clock to interpret into 22 other European languages. FK: I do n ot agree. It is a market. If there is no enough people around, but there is a demand for it, than let the people learn the trade and enter the market. If they find it abitng cost, make language/translator training more effective. There are ample solutions in that direction too. These tasks are again different from the "help" tasks, and from the "human" tasks. Now, looking at the footage kindly shown by Kovács úr, with no sound, the only thing I can say is that the efficiency of this system (containing no MT) is quite high, as I can understand not only the general topic of the discussions, but also most of the utterances, despite the numerous errors. Many of these errors would admittedly no be done by humans, but if a stenotypist would transcribe and her output would be fed into a program to turn it into correct running text (IBM-France did it in the 80's, it may be commercial), the stenotypist would stop working after some time and then we would have nothing. FK: Comment: The footage has no sound, because the screen is in a company canteen (10 altogether) with the volume turned off, because the company does not want to disturb the lunchers. So the whole service is absolutely superflous and outragingly costly. Case 3: MT is there to help normal people(I want to say: not translators, not even real bilinguals)translate in their domains from a language they know only a little or not at all. The "operational" architecture of the MT system has to be different, because it is again another task. FK: Comment: I like your phrasing normal people. I guess you are one of them (and not a translator or bilinugual) What can be done is to present the user with - the source text enhanced with annotations in his language (multiple "pidgin translation", a term introduced in 1971 by Brian Harris, professor of translatology, at that time director of the TAUM project at UdM, Montréal), - many candidate pretranslations, factorized in such a way that 1 only appears, the "best trajectory" in the underlying controlled confusion network, and that it can be changed to another one, or directly edited when the user has understood the source segment, relying on those "linguistic crutches" and his/her good domain knowledge. FK: I know a couple of other things that also can be done. Go and get educated in Six Sigma to find out more about 21st century quality control. That is clearly another task, for different persons. Case 4: MT is there to help speakers of different languages converse (chat or spoken dialogue). Here, there is a possibility that - interlocutors know to some extent a common language (scenario of VerbMobil-1) - they can use some level of interactive disambiguation For example, Converser for HealthCare (by M.Seligman, SpokenTranslation Inc) is designed for helping health personnel in the US converse with hispanophone patients and their families about almost any topic, not just health and medicine. To raise the quality level, the system offers . in-built controls (over the result of speech recognition, and indirectly over translations, using reverse translation) . interactive word sense disambiguation in the source language. Again, the task is different, as it depends on the "translational situation", and the "operational architecture" of the MT system has to be different. FK: Comment: The speakers of different languages really need to be helped, but MT is theoretically wrongly based. It just cannot cope with context, which is also part of meaning, a concept that academic lingusits and the MT practitiners elude even to define properly. Conclusions: 1) one cannot judge MT in its various forms "intrinsically", as if its task would be the same as that of professional translators like Kovács úr. 2) the particular system (speech-to-text) he refers to seems to me to merit something like a B+ (15/20) using a task-related measure. 3) the remark "You do not need 99 percent of the functionalities available. Just think about that." applies very well not only to MicroSoft software, but also to speech recognition, translation, etc. In this case (TV), we do not need the 99% of text quality a professional stenotypist followed by a program could produce after a few seconds. What we need, and get, is the 10% (2/20) of "linguistic quality", the real-time behavior, and the ergonomy, that together allow us to follow the TV show in real time. 4) returning to MT: always remember the evaluation of Systran Rus-Eng at Euratom (Ispra) in 1972: it got 18/20 (A+) from its users (nuclear scientistsà, and 2/20 (D--) by teachers of translation. All translators, please realise that MT has never been there to replace you, but can help you a lot more than translation memories in many cases. FK: Comment: Try to convince the translators that life is better for them by having MT around. Just listen to the posts you will be getting after I have crossposted yours to Lantra. Best regards, Xan -- ------------------------------------------------------------------------- Christian Boitet (Pr. Universite' Joseph Fourier) ======= Notez svp /Please note ======= GETA --> GETALP ============ Groupe d'Etude pour la Traduction Automatique et le Traitement Automatisé des Langues et de la Parole G E T A L P NOUVEAU FAX / NEW FAX (1/8/08) GETALP, LIG-campus, BP 53 (ex: GETA, CLIPS, IMAG-campus) Tel: +33 (0)4 76 51 43 55 / 51 48 17 Fax: +33 (0)4 76 63 56 86 385, rue de la Bibliothe`que Mel: [email protected] 38041 GrenobleCedex 9, France You too FK Ferenc Kovacs alias Frank Genezistan "Starting all over" +44 7770654068 (Vodafone) http://www.firkasz.com/news.php?item.58.8 and http://translationjournal.net/journal/46meaning.htm http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373 5 St. Mary's Place Newbury, Berkshire RG14 1EG U.K. ________________________________ From: Christian Boitet <[email protected]> To: [email protected] Cc: FERENC KOVACS <[email protected]>; [email protected]; [email protected]; Gábor Prószéky <[email protected]>; [email protected] Sent: Saturday, 21 February, 2009 2:41:56 AM Subject: [Mt-list] Reaction to Kovács úr e-mail : speech recognition and transcription to text on Skynews UK Dear all, 20/2/09 At 18:30 +0000 20/02/09, FERENC KOVACS wrote: To see a sample, see the actual footage below: http://www.firkasz.com/news.php interesting... Ferenc Kovacs alias Frank Genezistan "Starting all over" +44 7770654068 (Vodafone) www.firkasz.com and http://translationjournal.net/journal/46meaning.htm http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373 5 St. Mary's Place Newbury, Berkshire RG14 1EG U.K. _______________________________________________ Mt-list mailing list COMMENTS Interpreters are known to refuse to be taped, because what they produce, once transcribed, would be judged by Kovács úr "not a serious text, but a farce or a parody". He would be right, of course. But interpreters are certainly paid more by the hour that serious translators of written texts... Why? Simply, the task is not the same. And interpreters do a very good job of conveying most of the meaning of a monologue or a discussion, under real-time constraints and lexical stress. Cognitively speaking, interpreting is much more tiring than translating. Now, the situation is the same with MT of text and speech, relative to translators and interpreters. Case 1: MT is made and used for helping translators like Kovács úr. For example, with what Morphologics offers now, between Hungarian and 32 other languages, notably English, Russian, etc., he could probably increase his productivity by 2 to 3 -- but only if he postedits (reading always the source segment before looking at the "pretranslation" and trying to make a good translation out of it) instead of trying to revise (reading the MT result first). I personnally recently postedited (online) results of Systran EF on rather technical texts on water and ecoloy at a rate of 500-800 words/hour, on 7000 segments. Case 2: MT is there to help people understand written or spoken utterances, and there is no translator and no interpreter there to do the job. - obviously, there can't be one when you browse web pages, and no translator could possibly translate (even "pretranslate" a web page in 1 second, which is less than the time to read it in the first place). Again, this is another task. - for speech, there is also a practical and financial impossibility: no TV channel coulde hire interpreters round the clock to interpret into 22 other European languages. These tasks are again different from the "help" tasks, and from the "human" tasks. Now, looking at the footage kindly shown by Kovács úr, with no sound, the only thing I can say is that the efficiency of this system (containing no MT) is quite high, as I can understand not only the general topic of the discussions, but also most of the utterances, despite the numerous errors. Many of these errors would admittedly no be done by humans, but if a stenotypist would transcribe and her output would be fed into a program to turn it into correct running text (IBM-France did it in the 80's, it may be commercial), the stenotypist would stop working after some time and then we would have nothing. Case 3: MT is there to help normal people (I want to say: not translators, not even real bilinguals)translate in their domains from a language they know only a little or not at all. The "operational" architecture of the MT system has to be different, because it is again another task. What can be done is to present the user with - the source text enhanced with annotations in his language (multiple "pidgin translation", a term introduced in 1971 by Brian Harris, professor of translatology, at that time director of the TAUM project at UdM, Montréal), - many candidate pretranslations, factorized in such a way that 1 only appears, the "best trajectory" in the underlying controlled confusion network, and that it can be changed to another one, or directly edited when the user has understood the source segment, relying on those "linguistic crutches" and his/her good domain knowledge. That is clearly another task, for different persons. Case 4: MT is there to help speakers of different languages converse (chat or spoken dialogue). Here, there is a possibility that - interlocutors know to some extent a common language (scenario of VerbMobil-1) - they can use some level of interactive disambiguation For example, Converser for HealthCare (by M.Seligman, SpokenTranslation Inc) is designed for helping health personnel in the US converse with hispanophone patients and their families about almost any topic, not just health and medicine. To raise the quality level, the system offers . in-built controls (over the result of speech recognition, and indirectly over translations, using reverse translation) . interactive word sense disambiguation in the source language. Again, the task is different, as it depends on the "translational situation", and the "operational architecture" of the MT system has to be different. Conclusions: 1) one cannot judge MT in its various forms "intrinsically", as if its task would be the same as that of professional translators like Kovács úr. 2) the particular system (speech-to-text) he refers to seems to me to merit something like a B+ (15/20) using a task-related measure. 3) the remark "You do not need 99 percent of the functionalities available. Just think about that." applies very well not only to MicroSoft software, but also to speech recognition, translation, etc. In this case (TV), we do not need the 99% of text quality a professional stenotypist followed by a program could produce after a few seconds. What we need, and get, is the 10% (2/20) of "linguistic quality", the real-time behavior, and the ergonomy, that together allow us to follow the TV show in real time. 4) returning to MT: always remember the evaluation of Systran Rus-Eng at Euratom (Ispra) in 1972: it got 18/20 (A+) from its users (nuclear scientistsà, and 2/20 (D--) by teachers of translation. All translators, please realise that MT has never been there to replace you, but can help you a lot more than translation memories in many cases. Best regards, Xan -- ------------------------------------------------------------------------- Christian Boitet (Pr. Universite' Joseph Fourier) ======= Notez svp /Please note ======= GETA --> GETALP ============ Groupe d'Etude pour la Traduction Automatique et le Traitement Automatisé des Langues et de la Parole G E T A L P NOUVEAU FAX / NEW FAX (1/8/08) GETALP, LIG-campus, BP 53 (ex: GETA, CLIPS, IMAG-campus) Tel: +33 (0)4 76 51 43 55 / 51 48 17 Fax: +33 (0)4 76 63 56 86 385, rue de la Bibliothe`que Mel: [email protected] 38041 Grenoble Cedex 9, France -- Timo Honkela, Chief Research Scientist, PhD, Docent Adaptive Informatics Research Center Helsinki University of Technology P.O.Box 5400, FI-02015 TKK timo.honkela at tkk.fi, http://www.cis.hut.fi/tho/
_______________________________________________ Mt-list mailing list
