Re: [Mt-list] All you need to know about controlled language authoring and trans

ByteCool Software Thu, 24 Jun 2004 11:10:55 -0700

Hi Jeff,

Sorry for the late response. Please see below for replies.

From: Jeff Allen <[EMAIL PROTECTED]> To: ByteCool Software <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED] Subject: Re: [Mt-list] All you need to know about controlled language authoring and translation Date: Wed, 9 Jun 2004 22:59:38 +0200
Quoting ByteCool Software <[EMAIL PROTECTED]>:
> Dear All,
> BabelCode Project has compiled a pdf describing its latest research in
> controlled language authoring and translation. Read it at
> http://www.babelcode.org/babelcode.pdf
> Best Regards,
> Yao
> http://www.babelcode.org
Yao,
You mention that the document babelcode.pdf v1.0 (dated June 4, 2004) contains all that is necessary to know about Controlled Language and Machine Translation. I'm not sure that this is accurate. See my comments below based on extracts from the document.

I think I didn't miss any big issue in outlining the architecture of the BabelCode solution, so this PDF can at least serve as an introductory reading for both laymen and experts in this field. The advertisement "everything you need to know about controlled translation and BabelCode" is for the purpose of attracting the widest extent of reader interest...

Your document states:
====
Chapter 4 The Design of a Formalized Natural Language discusses BabelCode's
unique ideas and design of formalized natural languages
====
I would not say that the BabelCode approach is unique. It is a thematic/semantic role approach. The functioning of some Controlled language systems, already described years ago in internal team presentations, does not differ much from your recent one.

I agree. I knew that was the thematic/semantic role approach used in interlingua representation practices. I just thought the C++ -like "<object>.<method>(arguments)" syntax paradigm looked novel in controlled languages. Maybe it can't count as a "theoretical innovation". I will remove this claim in the next version of the PDF.

Your document states: ==== 4. The Design of a Formalized Natural Language 4.1. Overview Unlike other controlled translation solutions that use a loosely controlled source natural language specification, BabelCode uses a syntactically formalized natural language design which is more like a programming language such as C/C++. This eliminates all the syntactic ambiguities in the very beginning, and allows the human author and the computer to focus on semantic expression and disambiguation. ====

I do not know where you derive the idea that "all" other controlled translation solutions use loosely controlled source natural language specifications. There are different types of controlled languages. Some are monolingual and human- reader focused. Some are multilingual and machine-readable focused. Some like General Motors Global English consist of 12 rules. Yet the General Motors CASL controlled language, has more specific goals and consists of 62 rules. I taught Caterpillar CTE to technical authors and translators in the mid-90s, and I would say that based on the dozens of specific linguistic rules to master, that it was far from being loosely specified.

I will remove that unnecessary comparison from the PDF. Instead I will write: "Instead of using a loosely controlled natural language specification, BabelCode uses ...".

Sharon O'Brien and Ursula Reuther both individually presented papers at
EAMT/CLAW2003 on the comparison of different CL types and rules.


I've just downloaded and will read them. Thanks for the info.

Your document states: ==== I am dissatisfied with the principles, designs and availability of these systems and therefore decided to conduct my own research and development of a new one. ====

As for principles and designs, have you read the internal specifications of all the existing systems? Concerning availability, keep in mind that many of these industry-customized systems and deployed systems at customer sites have been subject to proprietary information conditions. The industrial and corporate players do not want to fund a major project just to give away all the investment to others (including competitors).

Availability is the biggest problem in these systems. I think there should be a freely available, general purpose controlled translation solution on the Internet. UNL (www.undl.org) is also working toward this goal, but its licensing options, business mode and published theoretical documentation don't satisfy me. Its proposed "enconversion" process sounds not efficient to me.

Your document states: ==== 5. The Authoring and Translation Processes 5.1. Overview The whole workflow of authoring and translation usually has two passes. In the first pass (composing pass) the author composes the document in a formal syntax, with no regard to the semantic ambiguity in word senses; then the computer tries to automatically resolve semantic ambiguities using various WSD approaches; then the author manually checks and corrects misinterpreted words in the second pass (reviewing pass). Finally the computer automatically generates target language translations using a generation engine and related lexical databases. ====

Your idea of two passes might be good for researchers, but it will be difficult to require highly stressed technical authors and editing teams, who need to collectively produce 500-1000 pages of source language documentation per day, to follow this. These users have monthly and yearly quotas to meet, and their annual appraisal is based on those goals and quotas. If external requirements mean producing less documentation, then it will be hard to get their buy-in.

I'm not planning to develop and sell industry-oriented controlled translation solutions. There can be optimization methods for high-speed domain-specific controlled authoring, such as more limited vocabulary and sentence patterns, and lower translation quality caused by looser writing rules but compensated later by post-editing. I'm trying to lead an open source project that guarantees publication quality and targets general users (website and document globalization for small businesses or groups; encouragement of email, instant messenging or usenet discussion between non-English speakers).

Two things to keep in mind: 1) users will only accept and use tools and guidelines when they see the benefit for themselves, so make sure that your implementation method will not slow down the users. And 2) you will need to spend a lot of time and effort specifically showing the users on their own texts how they can be more productive and effective with the tool and methodology, so you need to be prepared to quickly become an expert in their technical domain and prove to these very experienced users that you know their work and tasks better than them.

As stated above, I'm not going to lobby industrial players... What I want to do is make available a working solution that everyone can play with and find useful. I believe in every field there should be two competing players (Windows vs. Linux, Democrats vs. Republicans, English vs. BabelCode...)

Implementing Controlled Language and/or Translation technologies is not easy because language is not organized into perfect boxes. Language is very ambiguous, even when we think we have supposedly mastered it. Lexical and semantic ambiguity is just the beginning. Pragmatic aspects of ambiguity (known as the "garbage in" syndrome) are the norm and keep us very busy beyond what we had theoretically planned it should be like in the real world.

Pragmatic usages usually have patterns to be recognized and then converted to a respective "macro". Word-level pragmatic usages (metaphors) such as "He is a *black sheep*" may be concluded as a "class::class" relationship (person::be(blacksheep) and animal::be(blacksheep)), or disamguated by context-based WSD approaches such as the "Statistical WSD with Online BabelCode Corpus" described in the PDF. Anyway I will work more on the pragmatic issue.


Your theoretical approach has some good points to it, yet be careful not to
underestimate the real implementation of it for users in business and
industrial contexts.

I am doing a working general-purpose solution, although industry-specific optimizations can be implemented by user-defined namespaces, macros and class relationship knowledge.

I am working on an interlingua spec, an Englet spec, an Englet IAE and a Chinese generation engine. Other language modules are expected to be implemented by volunteer developers but I will provide general developer support. Some fancy things, such as the "BabelCode-aware search engine" designed for statistical WSD, may not be implemented by myself. Like Linus's role in the Linux society, I should stick with the kernel only...

Regards,

Jeff Allen
Paris, France
[EMAIL PROTECTED]
Jeff's CL/MT/Speech technology web portal:
http://www.geocities.com/jeffallenpubs/

_______________________________________________
MT-List mailing list
[EMAIL PROTECTED]
http://www.computing.dcu.ie/mailman/listinfo/mt-list

_________________________________________________________________ MSN Movies - Trailers, showtimes, DVD's, and the latest news from Hollywood! http://movies.msn.click-url.com/go/onm00200509ave/direct/01/


_______________________________________________
MT-List mailing list
[EMAIL PROTECTED]
http://www.computing.dcu.ie/mailman/listinfo/mt-list

Re: [Mt-list] All you need to know about controlled language authoring and trans

Reply via email to