Dear Mr Kovács, 21/2/09
At 4:26 +0000 21/02/09, FERENC KOVACS wrote:
To Lantrans to comment on :Christian Boitet
<<mailto:[email protected]>[email protected]>
,s coments on my posting a video on the
"garbage" produced by Language Technology
practitioners.
here we go: (He calls me Kovacs Úr as if my name
was translated by a MT gadget, making it
absolutely strange to the general tone of forum
communications, and also an insult in my home
country wuith that emphaisis.)
Please accept my apologies if that could be
misconstrued as an insult. It was not at all my
intention.
Kovács úr means simply "Mr Kovacs" in Hungarian.
It is not an insult. It was simply a wink -- I
have actually liked Hungary and Hungarians so
much since I met some here and there in 1971 that
I have tried to learn the language (at a nyári
egyetemen then alone) and still speak and read it
to a certain extent.
COMMENTS
Interpreters are known to refuse to be taped,
because what they produce, once transcribed,
would be judged by Kovács úr "not a serious
text, but a farce or a parody".
FK: I agree
He would be right, of course. But interpreters
are certainly paid more by the hour that serious
translators of written texts...
Why?
Simply, the task is not the same.
And interpreters do a very good job of conveying
most of the meaning of a monologue or a
discussion, under real-time constraints and
lexical stress.
Cognitively speaking, interpreting is much more tiring than translating.
FK. I do not agree.
Cognitively speaking interpreting requires
immediate output, translating in writing gives
yo you more time.
Then you agree!
Interpretors doing simultaneous interpretation
usually work in pairs and interpret for 20-30mn
only before switching.
Translators can translate for hours without
stopping (but I did not say it is not tiring).
The difference between a professional and a
lousy translators (of any modality) is:
1. Speed of performance/reaction
2. Number of mistakes made
Now, the situation is the same with MT of text
and speech, relative to translators and
interpreters.
FK: As long as you can call MT a lousy
translator with the overhead of using a
complicated software for various conversions and
Case 1: MT is made and used for helping translators like Kovács úr.
FK: Do not agree.
MT is done to make money for the owner of the
software, usually the agency, the translator is
exposed to an aggressive intermediator in the
market.
For example, with what Morphologics offers now,
between Hungarian and 32 other languages,
notably English, Russian, etc., he could
probably increase his productivity by 2 to 3 --
but only if he postedits (reading always the
source segment before looking at the
"pretranslation" and trying to make a good
translation out of it) instead of trying to
revise (reading the MT result first). I
personnally recently postedited (online) results
of Systran EF on rather technical texts on water
and ecoloy at a rate of 500-800 words/hour, on
7000 segments.
FK: And you may have left in there some mistakes
like ecoloy above. It has also has happende that
25 microliter and 50 mililiter were found to
differ in numbers only. Think about the
conseqeunces.
Well... if professional translators would do a
perfect job, there would be no need for
professional revision in the first place.
But the industrial practice is that, to translate
a standard page of 250 words in a technical
domain:
- a translator spends about 1 hour producing a first draft
- a revisor spends about 20 minutes polishing it
- the revisor is more qualified, and his 20
minutes are paid the same as 30 minutes of the
translator.
It has happened to me several times to have to
revise heavily the translations of professional
(not lousy!) translators -- not only of my own
texts. There was a lot to correct.
To expect the output of MT to be perfect is even
more unrealistic (I would say "angelic") that to
expect the productions or human translators to be
perfect.
With MT software you normally do not have the
context of the original layout either, which
will lead to further problems. Will elaborate on
this on request.
If you mean that the software does not use the
information about the layout, like interpreting
an image to disambiguate a text near to it, you
are absolutely right.
MT is based on FORM, very rarely on real content.
There ARE a few MT systems based on content, like
Catalyst at Caterpillar (derived of KBMT-89 then
KANT-92 bu CMU), using an ontology of the domain
of the documents.
relying on explicit understanding is simply impossible in most situations.
Working on the form, even only at the text level
(as is done in "analogical translation") can
however work to a certain degree, because (that
has been shown in experiments on large corpora)
in NL, 96% of the analogies of form are also analogies of content.
Many say that statistical MT (rather,
probabilistic MT) can not work, but it works to a
certain extent, once enough corpora have been
gathered. It could not work without that property
of natural languages.
By contrast, if you try to build a compiler of a
programming language this way, you know in
advance it will not work AT ALL.
Case 2: MT is there to help people understand
written or spoken utterances, and there is no
translator and no interpreter there to do the
job.
- obviously, there can't be one when you browse
web pages, and no translator could possibly
translate (even "pretranslate" a web page in 1
second, which is less than the time to read it
in the first place). Again, this is another task.
- for speech, there is also a practical and
financial impossibility: no TV channel coulde
hire interpreters round the clock to interpret
into 22 other European languages.
FK: I do not agree. It is a market. If there is
no enough people around, but there is a demand
for it, than let the people learn the trade and
enter the market. If they find it a bitng cost,
make language/translator training more
effective. There are ample solutions in that
direction too.
Yes but... that is not an answer.
I repeat: NO HUMAN CAN POSSIBLY ANSWER THE NEED
of getting a dynamic page translated on the fly
(in less than 1 second).
Hence, THE TASK IS DIFFERENT, and there is no
POSSIBLE competition with humans, whatever their
number and qualifications.
These tasks are again different from the "help"
tasks, and from the "human" tasks.
Now, looking at the footage kindly shown by
Kovács úr, with no sound, the only thing I can
say is that the efficiency of this system
(containing no MT) is quite high, as I can
understand not only the general topic of the
discussions, but also most of the utterances,
despite the numerous errors. Many of these
errors would admittedly no be done by humans,
but if a stenotypist would transcribe and her
output would be fed into a program to turn it
into correct running text (IBM-France did it in
the 80's, it may be commercial), the stenotypist
would stop working after some time and then we
would have nothing.
FK: Comment: The footage has no sound, because
the screen is in a company canteen (10
altogether) with the volume turned off, because
the company does not want to disturb the
lunchers. So the whole service is absolutely
superflous and outragingly costly.
You say it. But, if this service continues to be
offered, maybe it is because the lunchers like it
more than a superloud sound in such a noise that
nothing can be understood anyway.
Now, please explain what you mean by
"outrageously expensive", and expensive for whom.
Apparently, the lunchers don't pay extra money
for this service, and the canteen is probably not
rich enough to pay "outrageouly expensive"
services.
Case 3: MT is there to help normal people (I
want to say: not translators, not even real
bilinguals) translate in their domains from a
language they know only a little or not at all.
The "operational" architecture of the MT system
has to be different, because it is again another
task.
FK: Comment: I like your phrasing normal
people. I guess you are one of them (and not a
translator or bilinugual)
I happen to speak very well at least French,
English, German, Russian (worte articles and
delivered lectures in all of them), quite well
Spanish, Italian, to some degree Japanese,
Hungarian, Malay, and to have studied, beside
Latin and Greek, a few other modern languages
(Thai, Chinese, Portugese, Vietnamese, Hindi,
Arabic) to understand their system.
The situation I refer to is the situation you
yourself would face if you had to translate some
text from an language you don't know or know only
a little, but concerning one of your hobbies.
Examples are teachers wanting to adapt textbooks
to their language when professional translation
of the textbooks is financially impossible, a
most frequent case.
What can be done is to present the user with
- the source text enhanced with annotations in
his language (multiple "pidgin translation", a
term introduced in 1971 by Brian Harris,
professor of translatology, at that time
director of the TAUM project at UdM, Montréal),
- many candidate pretranslations, factorized in
such a way that 1 only appears, the "best
trajectory" in the underlying controlled
confusion network, and that it can be changed to
another one, or directly edited when the user
has understood the source segment, relying on
those "linguistic crutches" and his/her good
domain knowledge.
FK: I know a couple of other things that also
can be done. Go and get educated in Six Sigma to
find out more about 21st century quality control.
... and how would that help such people in such situations?
That is clearly another task, for different persons.
Case 4: MT is there to help speakers of
different languages converse (chat or spoken
dialogue). Here, there is a possibility that
- interlocutors know to some extent a common
language (scenario of VerbMobil-1)
- they can use some level of interactive disambiguation
For example, Converser for HealthCare (by
M.Seligman, SpokenTranslation Inc) is designed
for helping health personnel in the US converse
with hispanophone patients and their families
about almost any topic, not just health and
medicine. To raise the quality level, the system
offers
. in-built controls (over the result of speech
recognition, and indirectly over translations,
using reverse translation)
. interactive word sense disambiguation in the source language.
Again, the task is different, as it depends on
the "translational situation", and the
"operational architecture" of the MT system has
to be different.
FK: Comment: The speakers of different languages
really need to be helped, but MT is
theoretically wrongly based. It just cannot cope
with context, which is also part of meaning, a
concept that academic linguists and the MT
practitioners elude even to define properly.
What is wrongly based is what you say about MT,
both about its tasks, and about the technology
behind. There are many types of MT systems...
Some MT systems can use the context of the whole document they are applied to.
All MT systems indirectly use the context of a
domain and a typology (if only because they allow
to combine dictionaries and define priorities --
not enough, but something in the right direction).
In some systems, WSD (word sense disambiguation)
is also done by methods using the whole available
context (maybe not only a particular document,
but a corpus).
In some systems (AS-Transac by Toshiba),
syntactic preferences are also recomputed after a
first pass on the whole document.
Of course, MT systems do not use context and background as well as a humans.
So what? The point is that they are there to help
humans, or to do a job which resembles a human
job but which no human can do (again, "translate"
a Web page in 1 second).
It is like transport.
Cars can't drive you home when you are unable to
drive. Horses did bring their masters home after
heavy drinking.
Jets don't flatter their wings, and cannot bring you home.
But they do fly and bring you near your home far
quicker than if you would take a train or walk
the whole way.
The various brands of MT are TOOLS to be used by HUMANS to achieve some GOALS.
It is not surprising that their TASKS are different...
Unless you understand that and stop making as if
MT sytems were supposed to be robots mimicking
what human translators do, you will continue to
be as aggressive, but that will lead nowhere.
A better idea would be to try and see how you
could make the best of available technology --
for YOUR job.
Conclusions:
1) one cannot judge MT in its various forms
"intrinsically", as if its task would be the
same as that of professional translators like
Kovács úr.
2) the particular system (speech-to-text) he
refers to seems to me to merit something like a
B+ (15/20) using a task-related measure.
3) the remark "You do not need 99 percent of the
functionalities available. Just think about
that." applies very well not only to MicroSoft
software, but also to speech recognition,
translation, etc. In this case (TV), we do not
need the 99% of text quality a professional
stenotypist followed by a program could produce
after a few seconds. What we need, and get, is
the 10% (2/20) of "linguistic quality", the
real-time behavior, and the ergonomy, that
together allow us to follow the TV show in real
time.
4) returning to MT: always remember the
evaluation of Systran Rus-Eng at Euratom (Ispra)
in 1972: it got 18/20 (A+) from its users
(nuclear scientistsà, and 2/20 (D--) by teachers
of translation.
All translators, please realise that MT has
never been there to replace you, but can help
you a lot more than translation memories in many
cases.
FK: Comment: Try to convince the translators
that life is better for them by having MT
around. Just listen to the posts you will be
getting after I have crossposted yours to Lantra.
On this matter, may I advise to consult
http://www.geocities.com/mtpostediting/.
That site (by Jeff Allen) can help translators
understand what is true and false about MT, and,
most important, how to make the best of it.
Another thing is that translators like you might
cooperate with researchers to improve MT systems
with methods and tricks of the trade. Please tell
us if you like the idea.
It has been put into practice since the 80's at
least at PAHO (Engspan and Spanam systems by
Marjorie Leon and her group), producing a very
fruitful synergy, and the output of their MT
systems can in some domains/typologies be
considered as equivalent to first drafts by
junior translators.
Best regards,
Xan
Yu too FK
Ferenc Kovacs
alias Frank
Genezistan
"Starting all over"
+44 7770654068 (Vodafone)
<http://www.firkasz.com/news.php?item.58.8>http://www.firkasz.com/news.php?item.58.8
and
<http://translationjournal.net/journal/46meaning.htm>http://translationjournal.net/journal/46meaning.htm <http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373>http://www.facebook.com/album.php?aid=2003546&l=1e704&id=1107563373
5 St. Mary's Place
Newbury, Berkshire
RG14 1EG
U.K.
--
-------------------------------------------------------------------------
Christian Boitet
(Pr. Universite' Joseph Fourier)
======= Notez svp /Please note ======= GETA --> GETALP ============
Groupe d'Etude pour la Traduction Automatique
et le Traitement Automatisé des Langues et de la Parole
G E T A L P
NOUVEAU FAX / NEW FAX (1/8/08)
GETALP, LIG-campus, BP 53 (ex: GETA, CLIPS, IMAG-campus)
Tel: +33 (0)4 76 51 43 55 / 51 48 17 Fax: +33 (0)4 76 63 56 86
385, rue de la Bibliothe`que Mel: [email protected]
38041 Grenoble Cedex 9, France
_______________________________________________
Mt-list mailing list