[Mt-list] Call for contributions: Machine Learning for Multilingual Information Access (NIPS 2006 Workshop)

Cyril Goutte Wed, 04 Oct 2006 12:18:29 -0700

Readers of the MT-list may be interested in the following call:

---


Call for contributions

NIPS 2006 Workshop

MACHINE LEARNING FOR MULTILINGUAL INFORMATION ACCESS
====================================================

http://ilt.iit.nrc.ca/MLIA/

Description:
------------
In many different settings, accessing information available in
different languages is a challenge.

In Europe, the wide variety of languages is clearly a bottleneck for
efficient circulation and access to information. More than half of EU
citizens cannot hold a conversation in a language other than their
mother tongue. Even in an officially bilingual country like Canada,
less than one in five are considered to have a good enough command of
both official languages (2001 census data).

The traditional paradigm for addressing this issue is to perform human
translation on a massive scale, and rely on monolingual information
access technology. Although this model has worked reasonably well in
the past, the rapid increase in the amount of information produced
(and, in Europe, in the number of languages covered) raises questions
as to its sustainability. Machine Learning has the potential to help
develop and deploy technology that provides:

   1. access to information across different languages,
   2. usable translation from one language to another.

We are interested in Machine Learning techniques addressing for
example the following problems:

  * Word alignment
  * Machine translation
  * Multilingual lexicon and terminology extraction
  * Cross-lingual information retrieval
  * Cross-lingual categorisation


Goals of the workshop:
----------------------

Multilingual applications are also emerging as a promising application
for some Machine Learning techniques, for example the use of Kernel
CCA for Cross-Language applications, or large-margin approaches to
word alignment. This new trend converges with a well-established
interest of the Natural Language Processing community for learning
approaches.

The purpose of this workshop is to provide a forum for discussion of
current developments at the intersection between multilingual
processing and machine learning. This includes developing new
techniques to address various multilingual information access problems
(e.g. translation), but also scaling up existing techniques to the
available NLP data, developing tools for cross-language information
retrieval, etc.

We will promote discussions of some inter-related key issues in
applying Machine Learning to Multilingual problems:

* SCALING UP:
  - Applying ML to 100 million words corpora (e.g. SMT)
  - Deploying ML solutions on new language pairs

* SCARCE RESOURCES:
  - Languages or domains with limited bilingual corpora
  - Bootstrapping limited resources

* EVALUATION:
  - Design of better performance measures
  - Optimisation of application-specific measures
  - Learning human evaluation

* PRIOR LINGUISTIC KNOWLEDGE:
 - Modelling and using linguistic knowledge in ML
 - The continuum between all-data (SMT) and all prior knowledge
   (handcrafted rules)


Submission instructions:
------------------------

Researchers interested in presenting their work at the workshop should
send an email to: mlia (at) nrc-cnrc.gc.ca (preferably plain
text) with the following information:
- Title
- Author(s)
- Abstract (around 1 page)

Schedule:
Submission deadline: 29 October 2006
Notification: 6 November 2006
Workshop date: 8 or 9 December 2006


Co-organisers:
--------------
Cyril Goutte, National Research Council Canada
Nicola Cancedda, Xerox Research Centre Europe
Marc Dymetman, Xerox Research Centre Europe
George Foster, National Research Council Canada


Workshop format:
----------------
We intend to leave a good part of the workshop to panel discussions
that will address relevant topics in multilingual information access
(MIA), as well as invited talks presenting some important MIA problems
and associated challenges for Machine Learning. For each half day, we
will start with either a keynote or a short tutorial, continue with a
few shorter technical presentations, and end with a panel discussion
(topics to be decided depending on the confirmed list of speakers).

Invited speakers:
- Dan Melamed (Courant Institute, NYU)
- John Shawe-Taylor (ECS, U. of Southampton, UK), tbc
- Ralf Steinberger (JRC, Ispra, Italy)
- Wray Buntine (HIIT, Helsinki, Finland), tbc


Related work:
-------------
Past NIPS workshops have addressed related topics such as learning
with structured data, or the use of Machine Learning for Natural
Language Processing. There is also some ongoing interest within the
European network of excellence Pascal, as exemplified by the recent
workshop on intelligent information access. However none of these
specifically target multilingual aspects. We believe there is
sufficient interest and genuine need on this particular aspect to
justify a specific focus on multilingual information access.  The
newly started European project SMART (Statistical Multilingual
Analysis for Retrieval and Translation) is specifically targeting
advanced machine learning techniques for multilingual applications.


_______________________________________________
Mt-list mailing list

[Mt-list] Call for contributions: Machine Learning for Multilingual Information Access (NIPS 2006 Workshop)

Reply via email to