Quoting ByteCool Software <[EMAIL PROTECTED]>: > Dear All, > BabelCode Project has compiled a pdf describing its latest research in > controlled language authoring and translation. Read it at > http://www.babelcode.org/babelcode.pdf > Best Regards, > Yao > http://www.babelcode.org
Yao, You mention that the document babelcode.pdf v1.0 (dated June 4, 2004) contains all that is necessary to know about Controlled Language and Machine Translation. I'm not sure that this is accurate. See my comments below based on extracts from the document. Your document states: ==== Chapter 4 The Design of a Formalized Natural Language discusses BabelCode's unique ideas and design of formalized natural languages ==== I would not say that the BabelCode approach is unique. It is a thematic/semantic role approach. The functioning of some Controlled language systems, already described years ago in internal team presentations, does not differ much from your recent one. Your document states: ==== 4. The Design of a Formalized Natural Language 4.1. Overview Unlike other controlled translation solutions that use a loosely controlled source natural language specification, BabelCode uses a syntactically formalized natural language design which is more like a programming language such as C/C++. This eliminates all the syntactic ambiguities in the very beginning, and allows the human author and the computer to focus on semantic expression and disambiguation. ==== I do not know where you derive the idea that "all" other controlled translation solutions use loosely controlled source natural language specifications. There are different types of controlled languages. Some are monolingual and human- reader focused. Some are multilingual and machine-readable focused. Some like General Motors Global English consist of 12 rules. Yet the General Motors CASL controlled language, has more specific goals and consists of 62 rules. I taught Caterpillar CTE to technical authors and translators in the mid-90s, and I would say that based on the dozens of specific linguistic rules to master, that it was far from being loosely specified. Sharon O'Brien and Ursula Reuther both individually presented papers at EAMT/CLAW2003 on the comparison of different CL types and rules. Your document states: ==== I am dissatisfied with the principles, designs and availability of these systems and therefore decided to conduct my own research and development of a new one. ==== As for principles and designs, have you read the internal specifications of all the existing systems? Concerning availability, keep in mind that many of these industry-customized systems and deployed systems at customer sites have been subject to proprietary information conditions. The industrial and corporate players do not want to fund a major project just to give away all the investment to others (including competitors). Your document states: ==== 5. The Authoring and Translation Processes 5.1. Overview The whole workflow of authoring and translation usually has two passes. In the first pass (composing pass) the author composes the document in a formal syntax, with no regard to the semantic ambiguity in word senses; then the computer tries to automatically resolve semantic ambiguities using various WSD approaches; then the author manually checks and corrects misinterpreted words in the second pass (reviewing pass). Finally the computer automatically generates target language translations using a generation engine and related lexical databases. ==== Your idea of two passes might be good for researchers, but it will be difficult to require highly stressed technical authors and editing teams, who need to collectively produce 500-1000 pages of source language documentation per day, to follow this. These users have monthly and yearly quotas to meet, and their annual appraisal is based on those goals and quotas. If external requirements mean producing less documentation, then it will be hard to get their buy-in. Two things to keep in mind: 1) users will only accept and use tools and guidelines when they see the benefit for themselves, so make sure that your implementation method will not slow down the users. And 2) you will need to spend a lot of time and effort specifically showing the users on their own texts how they can be more productive and effective with the tool and methodology, so you need to be prepared to quickly become an expert in their technical domain and prove to these very experienced users that you know their work and tasks better than them. Implementing Controlled Language and/or Translation technologies is not easy because language is not organized into perfect boxes. Language is very ambiguous, even when we think we have supposedly mastered it. Lexical and semantic ambiguity is just the beginning. Pragmatic aspects of ambiguity (known as the "garbage in" syndrome) are the norm and keep us very busy beyond what we had theoretically planned it should be like in the real world. Your theoretical approach has some good points to it, yet be careful not to underestimate the real implementation of it for users in business and industrial contexts. Regards, Jeff Allen Paris, France [EMAIL PROTECTED] Jeff's CL/MT/Speech technology web portal: http://www.geocities.com/jeffallenpubs/ _______________________________________________ MT-List mailing list [EMAIL PROTECTED] http://www.computing.dcu.ie/mailman/listinfo/mt-list
