ByteCool Software
Thu, 24 Jun 2004 11:10:55 -0700
Hi Jeff,
Sorry for the late response. Please see below for replies.
From: Jeff Allen <[EMAIL PROTECTED]>
To: ByteCool Software <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]
Subject: Re: [Mt-list] All you need to know about controlled language authoring and translation
Date: Wed, 9 Jun 2004 22:59:38 +0200
Quoting ByteCool Software <[EMAIL PROTECTED]>: > Dear All, > BabelCode Project has compiled a pdf describing its latest research in > controlled language authoring and translation. Read it at > http://www.babelcode.org/babelcode.pdf > Best Regards, > Yao > http://www.babelcode.org
Yao,
You mention that the document babelcode.pdf v1.0 (dated June 4, 2004) contains
all that is necessary to know about Controlled Language and Machine
Translation. I'm not sure that this is accurate. See my comments below based on
extracts from the document.
Your document states: ==== Chapter 4 The Design of a Formalized Natural Language discusses BabelCode's unique ideas and design of formalized natural languages ====
I would not say that the BabelCode approach is unique. It is a
thematic/semantic role approach. The functioning of some Controlled language
systems, already described years ago in internal team presentations, does not
differ much from your recent one.
Your document states:
====
4. The Design of a Formalized Natural Language
4.1. Overview
Unlike other controlled translation solutions that use a loosely controlled
source natural language specification, BabelCode uses a syntactically
formalized natural language design which is more like a programming language
such as C/C++. This eliminates all the syntactic ambiguities in the very
beginning, and allows the human author and the computer to focus on semantic
expression and disambiguation.
====
I do not know where you derive the idea that "all" other controlled translation
solutions use loosely controlled source natural language specifications. There
are different types of controlled languages. Some are monolingual and human-
reader focused. Some are multilingual and machine-readable focused. Some like
General Motors Global English consist of 12 rules. Yet the General Motors CASL
controlled language, has more specific goals and consists of 62 rules. I
taught Caterpillar CTE to technical authors and translators in the mid-90s, and
I would say that based on the dozens of specific linguistic rules to master,
that it was far from being loosely specified.
Sharon O'Brien and Ursula Reuther both individually presented papers at EAMT/CLAW2003 on the comparison of different CL types and rules.
I've just downloaded and will read them. Thanks for the info.
Your document states:
====
I am dissatisfied with the principles, designs and availability of these
systems and therefore decided to conduct my own research and development of a
new one.
====
As for principles and designs, have you read the internal specifications of all
the existing systems?
Concerning availability, keep in mind that many of these industry-customized
systems and deployed systems at customer sites have been subject to proprietary
information conditions. The industrial and corporate players do not want to
fund a major project just to give away all the investment to others (including
competitors).
Your document states:
====
5. The Authoring and Translation Processes
5.1. Overview
The whole workflow of authoring and translation usually has two passes. In the
first pass (composing pass) the author composes the document in a formal
syntax, with no regard to the semantic ambiguity in word senses; then the
computer tries to automatically resolve semantic ambiguities using various WSD
approaches; then the author manually checks and corrects misinterpreted words
in the second pass (reviewing pass). Finally the computer automatically
generates target language translations using a generation engine and related
lexical databases.
====
Your idea of two passes might be good for researchers, but it will be difficult
to require highly stressed technical authors and editing teams, who need to
collectively produce 500-1000 pages of source language documentation per day,
to follow this. These users have monthly and yearly quotas to meet, and their
annual appraisal is based on those goals and quotas. If external requirements
mean producing less documentation, then it will be hard to get their buy-in.
Two things to keep in mind: 1) users will only accept and use tools and
guidelines when they see the benefit for themselves, so make sure that your
implementation method will not slow down the users. And 2) you will need to
spend a lot of time and effort specifically showing the users on their own
texts how they can be more productive and effective with the tool and
methodology, so you need to be prepared to quickly become an expert in their
technical domain and prove to these very experienced users that you know their
work and tasks better than them.
Implementing Controlled Language and/or Translation technologies is not easy
because language is not organized into perfect boxes. Language is very
ambiguous, even when we think we have supposedly mastered it. Lexical and
semantic ambiguity is just the beginning. Pragmatic aspects of ambiguity
(known as the "garbage in" syndrome) are the norm and keep us very busy beyond
what we had theoretically planned it should be like in the real world.
Your theoretical approach has some good points to it, yet be careful not to underestimate the real implementation of it for users in business and industrial contexts.
Regards,
Jeff Allen Paris, France [EMAIL PROTECTED] Jeff's CL/MT/Speech technology web portal: http://www.geocities.com/jeffallenpubs/
_______________________________________________ MT-List mailing list [EMAIL PROTECTED] http://www.computing.dcu.ie/mailman/listinfo/mt-list
_______________________________________________ MT-List mailing list [EMAIL PROTECTED] http://www.computing.dcu.ie/mailman/listinfo/mt-list