Dear MachTians,
7/8/05
At 21:30 +0200 6/08/05, Jeff Allen wrote:
Does anyone know the status on the UNL project concerning controlled language writing and translation?
Jeff
Jeff Allen
Program Manager, Telecom Product Division, Mycom France
Advisor, MultiLingual Computing & Technology magazine
Advisor, LINGUIST List
Paris, France
e-mail: [EMAIL PROTECTED] or [EMAIL PROTECTED]
Language Technology Software Review site: http://www.geocities.com/langtecheval/
Concerning the UNL project:
- it is not an MT project in the classical sense of the
term,
- and it never considered that input should be a controlled
language.
It is a project about interlingual communication using as
"pivot" UNL, an "anglosemantic" language of
semantic hypergraphs using a proper vocabulary of UWs, semantic
relations, and interlingual attributes.
UW means "universal word", but I prefer the more down
to earth "unit of virtual vocabulary". The UW symbols are of
the form
<English lemma or term> ['(' <list of restrictions>
')']
ex:
book(icl>thing)
book(icl>do, agt>human, obj>thing)
book
In ideal cases, UW denotes a unique word sense (or term sense) WS
shared by all languages. In usual cases, a UW may denote, for
example, 2 WS in French and one in English for
"river(icl>thing)" -- rivière and fleuve. In
extreme cases, admitted but the UNL specification, a UW may have no
restriction (book) and denote all WS of all lemmas of this citation
form (verb, noun).
More in the papers assembled in the book prepared for the last
UNL workshop at CICLING-05 (Mexico).
Concerning applications of UNL, the idea is to assemble in many
possible ways:
- enconversion (NLX2UNL), which may be interactive
- deconversion (UNL2NLX), which may be interactive
- direct or indirect modification
- UNL-based processes (e.g. for gisting, summarizing, extracting
info).
The terms "enconversion" and "deconversion"
are volontarily different from "analysis" and
"generation" to stress that the corresponding operations
change the "lexical space" of the representations of
utterances. In other words, each contains a lexical transfer, while in
analysis or generation, one (possibly) changes the level of
abstraction of the vocabulary (form, lemma, derivational family (DF),
lemma+WS, DF+WS).
The project has gone into low gear when funding dwindled due to
the crisis in Japan, and also because of organizational inadequacy.
Good news are that a UNL Consortium, inspired by the W3C, is being set
up to promote the technical development of UNL and associated
tools.
The main figures in starting this consortium are Jesus
Cardeñosa and Igor Boguslavskij: Jesus Cardeñosa
<[EMAIL PROTECTED]>, "Jesu's CardeÒosa"
<[EMAIL PROTECTED]>, Igor Boguslavskij <[EMAIL PROTECTED]>
.
UNL groups actively involved in the UNL-consortium at this point
are from Brazil (RNP), France (GETA), Italy (ILC), Russia (IPPI),
Spain (UPM) Thailand (NecTec/NICT).
Other active UNL groups are in China (NLPR and XiaMen), Indonesia
(BPPT), Japan (UNU), Jordan (RSS and Al-Zaitounah).
There is some work on "UNL encyclopedia" and ontologies
between UNL-Japan (Uchida, founder of UNL) and a group in Geneva
university.
We are working with the Spanish and Russian group to estimate the
viability of the UNL approach on the Babel part of the Unesco web site
(textual content is about 150 "standard pages" or 3000
sentences):
- the material is translated into Spanish, French, and Russian by
whatever means available, and human effort is recorded (it takes me
20mn per page using Systran pretranslations, for example). We also
have it in Chinese.
- the UNL graphs are produced (again, manually or
semi-automatically). Manually, that takes less than 4 hours per
page.
- UNL graphs are deconverted into the 3 target languages
(automatically)
- Human effort to get good translations from the outputs is again
recorded (for whatever fraction can be handled with available
resources).
Best,
Ch.Boitet
_______________________________________________ Mt-list mailing list
