Hi Jeff,
Sorry for the late response. Please see below for replies.
From: Jeff Allen <[EMAIL PROTECTED]>
To: ByteCool Software <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]
Subject: Re: [Mt-list] All you need to know about controlled language authoring and translation
Date: Wed, 9 Jun 2004 22:59:38 +0200
Quoting ByteCool Software <[EMAIL PROTECTED]>: > Dear All, > BabelCode Project has compiled a pdf describing its latest research in > controlled language authoring and translation. Read it at > http://www.babelcode.org/babelcode.pdf > Best Regards, > Yao > http://www.babelcode.org
Yao,
You mention that the document babelcode.pdf v1.0 (dated June 4, 2004) contains
all that is necessary to know about Controlled Language and Machine
Translation. I'm not sure that this is accurate. See my comments below based on
extracts from the document.
I think I didn't miss any big issue in outlining the architecture of the BabelCode solution, so this PDF can at least serve as an introductory reading for both laymen and experts in this field. The advertisement "everything you need to know about controlled translation and BabelCode" is for the purpose of attracting the widest extent of reader interest...
Your document states: ==== Chapter 4 The Design of a Formalized Natural Language discusses BabelCode's unique ideas and design of formalized natural languages ====
I would not say that the BabelCode approach is unique. It is a
thematic/semantic role approach. The functioning of some Controlled language
systems, already described years ago in internal team presentations, does not
differ much from your recent one.
I agree. I knew that was the thematic/semantic role approach used in interlingua representation practices. I just thought the C++ -like "<object>.<method>(arguments)" syntax paradigm looked novel in controlled languages. Maybe it can't count as a "theoretical innovation". I will remove this claim in the next version of the PDF.
Your document states:
====
4. The Design of a Formalized Natural Language
4.1. Overview
Unlike other controlled translation solutions that use a loosely controlled
source natural language specification, BabelCode uses a syntactically
formalized natural language design which is more like a programming language
such as C/C++. This eliminates all the syntactic ambiguities in the very
beginning, and allows the human author and the computer to focus on semantic
expression and disambiguation.
====
I do not know where you derive the idea that "all" other controlled translation
solutions use loosely controlled source natural language specifications. There
are different types of controlled languages. Some are monolingual and human-
reader focused. Some are multilingual and machine-readable focused. Some like
General Motors Global English consist of 12 rules. Yet the General Motors CASL
controlled language, has more specific goals and consists of 62 rules. I
taught Caterpillar CTE to technical authors and translators in the mid-90s, and
I would say that based on the dozens of specific linguistic rules to master,
that it was far from being loosely specified.
I will remove that unnecessary comparison from the PDF. Instead I will write: "Instead of using a loosely controlled natural language specification, BabelCode uses ...".
Sharon O'Brien and Ursula Reuther both individually presented papers at EAMT/CLAW2003 on the comparison of different CL types and rules.
I've just downloaded and will read them. Thanks for the info.
Your document states:
====
I am dissatisfied with the principles, designs and availability of these
systems and therefore decided to conduct my own research and development of a
new one.
====
As for principles and designs, have you read the internal specifications of all
the existing systems?
Concerning availability, keep in mind that many of these industry-customized
systems and deployed systems at customer sites have been subject to proprietary
information conditions. The industrial and corporate players do not want to
fund a major project just to give away all the investment to others (including
competitors).
Availability is the biggest problem in these systems. I think there should be a freely available, general purpose controlled translation solution on the Internet. UNL (www.undl.org) is also working toward this goal, but its licensing options, business mode and published theoretical documentation don't satisfy me. Its proposed "enconversion" process sounds not efficient to me.
Your document states:
====
5. The Authoring and Translation Processes
5.1. Overview
The whole workflow of authoring and translation usually has two passes. In the
first pass (composing pass) the author composes the document in a formal
syntax, with no regard to the semantic ambiguity in word senses; then the
computer tries to automatically resolve semantic ambiguities using various WSD
approaches; then the author manually checks and corrects misinterpreted words
in the second pass (reviewing pass). Finally the computer automatically
generates target language translations using a generation engine and related
lexical databases.
====
Your idea of two passes might be good for researchers, but it will be difficult
to require highly stressed technical authors and editing teams, who need to
collectively produce 500-1000 pages of source language documentation per day,
to follow this. These users have monthly and yearly quotas to meet, and their
annual appraisal is based on those goals and quotas. If external requirements
mean producing less documentation, then it will be hard to get their buy-in.
I'm not planning to develop and sell industry-oriented controlled translation solutions. There can be optimization methods for high-speed domain-specific controlled authoring, such as more limited vocabulary and sentence patterns, and lower translation quality caused by looser writing rules but compensated later by post-editing. I'm trying to lead an open source project that guarantees publication quality and targets general users (website and document globalization for small businesses or groups; encouragement of email, instant messenging or usenet discussion between non-English speakers).
Two things to keep in mind: 1) users will only accept and use tools and
guidelines when they see the benefit for themselves, so make sure that your
implementation method will not slow down the users. And 2) you will need to
spend a lot of time and effort specifically showing the users on their own
texts how they can be more productive and effective with the tool and
methodology, so you need to be prepared to quickly become an expert in their
technical domain and prove to these very experienced users that you know their
work and tasks better than them.
As stated above, I'm not going to lobby industrial players... What I want to do is make available a working solution that everyone can play with and find useful. I believe in every field there should be two competing players (Windows vs. Linux, Democrats vs. Republicans, English vs. BabelCode...)
Implementing Controlled Language and/or Translation technologies is not easy
because language is not organized into perfect boxes. Language is very
ambiguous, even when we think we have supposedly mastered it. Lexical and
semantic ambiguity is just the beginning. Pragmatic aspects of ambiguity
(known as the "garbage in" syndrome) are the norm and keep us very busy beyond
what we had theoretically planned it should be like in the real world.
Pragmatic usages usually have patterns to be recognized and then converted to a respective "macro". Word-level pragmatic usages (metaphors) such as "He is a *black sheep*" may be concluded as a "class::class" relationship (person::be(blacksheep) and animal::be(blacksheep)), or disamguated by context-based WSD approaches such as the "Statistical WSD with Online BabelCode Corpus" described in the PDF. Anyway I will work more on the pragmatic issue.
Your theoretical approach has some good points to it, yet be careful not to underestimate the real implementation of it for users in business and industrial contexts.
I am doing a working general-purpose solution, although industry-specific optimizations can be implemented by user-defined namespaces, macros and class relationship knowledge.
I am working on an interlingua spec, an Englet spec, an Englet IAE and a Chinese generation engine. Other language modules are expected to be implemented by volunteer developers but I will provide general developer support. Some fancy things, such as the "BabelCode-aware search engine" designed for statistical WSD, may not be implemented by myself. Like Linus's role in the Linux society, I should stick with the kernel only...
Regards,
Jeff Allen Paris, France [EMAIL PROTECTED] Jeff's CL/MT/Speech technology web portal: http://www.geocities.com/jeffallenpubs/
_______________________________________________ MT-List mailing list [EMAIL PROTECTED] http://www.computing.dcu.ie/mailman/listinfo/mt-list
_________________________________________________________________
MSN Movies - Trailers, showtimes, DVD's, and the latest news from Hollywood! http://movies.msn.click-url.com/go/onm00200509ave/direct/01/
_______________________________________________ MT-List mailing list [EMAIL PROTECTED] http://www.computing.dcu.ie/mailman/listinfo/mt-list
