Title: Re: [Mt-list] status on the UNL project
Dear MachTians,                                         7/8/05

At 21:30 +0200 6/08/05, Jeff Allen wrote:
Does anyone know the status on the UNL project concerning controlled language writing and translation?
 
Jeff
 
Jeff Allen
Program Manager, Telecom Product Division, Mycom France
Advisor, MultiLingual Computing & Technology magazine
Advisor, LINGUIST List
Paris, France
e-mail:
[EMAIL PROTECTED] or [EMAIL PROTECTED]
Language Technology Software Review site: http://www.geocities.com/langtecheval/

Concerning the UNL project:
- it is not an MT project in the classical sense of the term,
- and it never considered that input should be a controlled language.

It is a project about interlingual communication using as "pivot" UNL, an "anglosemantic" language of semantic hypergraphs using a proper vocabulary of UWs, semantic relations, and interlingual attributes.

UW means "universal word", but I prefer the more down to earth "unit of virtual vocabulary". The UW symbols are of the form
<English lemma or term> ['(' <list of restrictions> ')']
ex:
book(icl>thing)
book(icl>do, agt>human, obj>thing)
book

In ideal cases, UW denotes a unique word sense (or term sense) WS shared by all languages.  In usual cases, a UW may denote, for example, 2 WS in French and one in English for "river(icl>thing)"  -- rivière and fleuve. In extreme cases, admitted but the UNL specification, a UW may have no restriction (book) and denote all WS of all lemmas of this citation form (verb, noun).

More in the papers assembled in the book prepared for the last UNL workshop at CICLING-05 (Mexico).

Concerning applications of UNL, the idea is to assemble in many possible ways:
- enconversion (NLX2UNL), which may be interactive
- deconversion (UNL2NLX), which may be interactive
- direct or indirect modification
- UNL-based processes (e.g. for gisting, summarizing, extracting infoŠ).

The terms "enconversion" and "deconversion" are volontarily different from "analysis" and "generation" to stress that the corresponding operations change the "lexical space" of the representations of utterances. In other words, each contains a lexical transfer, while in analysis or generation, one (possibly) changes the level of abstraction of the vocabulary (form, lemma, derivational family (DF), lemma+WS, DF+WS).

The project has gone into low gear when funding dwindled due to the crisis in Japan, and also because of organizational inadequacy. Good news are that a UNL Consortium, inspired by the W3C, is being set up to promote the technical development of UNL and associated tools.

The main figures in starting this consortium are Jesus Cardeñosa and Igor Boguslavskij: Jesus Cardeñosa <[EMAIL PROTECTED]>, "Jesu's CardeÒosa" <[EMAIL PROTECTED]>, Igor Boguslavskij <[EMAIL PROTECTED]> .

UNL groups actively involved in the UNL-consortium at this point are from Brazil (RNP), France (GETA), Italy (ILC), Russia (IPPI), Spain (UPM) Thailand (NecTec/NICT).

Other active UNL groups are in China (NLPR and XiaMen), Indonesia (BPPT), Japan (UNU), Jordan (RSS and Al-Zaitounah).

There is some work on "UNL encyclopedia" and ontologies between UNL-Japan (Uchida, founder of UNL) and a group in Geneva university.

We are working with the Spanish and Russian group to estimate the viability of the UNL approach on the Babel part of the Unesco web site (textual content is about 150 "standard pages" or 3000 sentences):
- the material is translated into Spanish, French, and Russian by whatever means available, and human effort is recorded (it takes me 20mn per page using Systran pretranslations, for example). We also have it in Chinese.
- the UNL graphs are produced (again, manually or semi-automatically). Manually, that takes less than 4 hours per page.
- UNL graphs are deconverted into the 3 target languages (automatically)
- Human effort to get good translations from the outputs is again recorded (for whatever fraction can be handled with available resources).

Best,

Ch.Boitet
_______________________________________________
Mt-list mailing list

Reply via email to