Re: [Mt-list] All you need to know about controlled language authoring and translation

Jeff Allen Wed, 09 Jun 2004 14:06:33 -0700

Quoting ByteCool Software <[EMAIL PROTECTED]>:
> Dear All,
> BabelCode Project has compiled a pdf describing its latest research in 
> controlled language authoring and translation. Read it at 
> http://www.babelcode.org/babelcode.pdf
> Best Regards,
> Yao
> http://www.babelcode.org


Yao,

You mention that the document babelcode.pdf v1.0 (dated June 4, 2004) contains 
all that is necessary to know about Controlled Language and Machine 
Translation. I'm not sure that this is accurate. See my comments below based on 
extracts from the document.

Your document states:
====
Chapter 4 The Design of a Formalized Natural Language discusses BabelCode's 
unique ideas and design of formalized natural languages
====

I would not say that the BabelCode approach is unique. It is a 
thematic/semantic role approach.  The functioning of some Controlled language 
systems, already described years ago in internal team presentations, does not 
differ much from your recent one.


Your document states:
====
4. The Design of a Formalized Natural Language
4.1. Overview
Unlike other controlled translation solutions that use a loosely controlled 
source natural language specification, BabelCode uses a syntactically 
formalized natural language design which is more like a programming language 
such as C/C++. This eliminates all the syntactic ambiguities in the very 
beginning, and allows the human author and the computer to focus on semantic 
expression and disambiguation.
====

I do not know where you derive the idea that "all" other controlled translation 
solutions use loosely controlled source natural language specifications.  There 
are different types of controlled languages. Some are monolingual and human-
reader focused. Some are multilingual and machine-readable focused. Some like 
General Motors Global English consist of 12 rules. Yet the General Motors CASL 
controlled language, has more specific goals and consists of 62 rules.  I 
taught Caterpillar CTE to technical authors and translators in the mid-90s, and 
I would say that based on the dozens of specific linguistic rules to master, 
that it was far from being loosely specified.  

Sharon O'Brien and Ursula Reuther both individually presented papers at 
EAMT/CLAW2003 on the comparison of different CL types and rules.


Your document states:
====
I am dissatisfied with the principles, designs and availability of these 
systems and therefore decided to conduct my own research and development of a 
new one.
====

As for principles and designs, have you read the internal specifications of all 
the existing systems?
Concerning availability, keep in mind that many of these industry-customized 
systems and deployed systems at customer sites have been subject to proprietary 
information conditions.  The industrial and corporate players do not want to 
fund a major project just to give away all the investment to others (including 
competitors).


Your document states:
====
5. The Authoring and Translation Processes
5.1. Overview
The whole workflow of authoring and translation usually has two passes. In the 
first pass (composing pass) the author composes the document in a formal 
syntax, with no regard to the semantic ambiguity in word senses; then the 
computer tries to automatically resolve semantic ambiguities using various WSD 
approaches; then the author manually checks and corrects misinterpreted words 
in the second pass (reviewing pass). Finally the computer automatically 
generates target language translations using a generation engine and related 
lexical databases.
====

Your idea of two passes might be good for researchers, but it will be difficult 
to require highly stressed technical authors and editing teams, who need to 
collectively produce 500-1000 pages of source language documentation per day, 
to follow this.  These users have monthly and yearly quotas to meet, and their 
annual appraisal is based on those goals and quotas.  If external requirements 
mean producing less documentation, then it will be hard to get their buy-in. 

Two things to keep in mind: 1) users will only accept and use tools and 
guidelines when they see the benefit for themselves, so make sure that your 
implementation method will not slow down the users. And 2) you will need to 
spend a lot of time and effort specifically showing the users on their own 
texts how they can be more productive and effective with the tool and 
methodology, so you need to be prepared to quickly become an expert in their 
technical domain and prove to these very experienced users that you know their 
work and tasks better than them.

Implementing Controlled Language and/or Translation technologies is not easy 
because language is not organized into perfect boxes. Language is very 
ambiguous, even when we think we have supposedly mastered it.  Lexical and 
semantic ambiguity is just the beginning.  Pragmatic aspects of ambiguity 
(known as the "garbage in" syndrome) are the norm and keep us very busy beyond 
what we had theoretically planned it should be like in the real world.

Your theoretical approach has some good points to it, yet be careful not to 
underestimate the real implementation of it for users in business and 
industrial contexts.

Regards,

Jeff Allen
Paris, France
[EMAIL PROTECTED]
Jeff's CL/MT/Speech technology web portal:
http://www.geocities.com/jeffallenpubs/

_______________________________________________
MT-List mailing list
[EMAIL PROTECTED]
http://www.computing.dcu.ie/mailman/listinfo/mt-list

Re: [Mt-list] All you need to know about controlled language authoring and translation

Reply via email to