** Call for Participation **

                COLING 2004 WORKSHOP ON
COMPUTATIONAL APPROACHES TO ARABIC SCRIPT-BASED LANGUAGES

                 Geneva, Switzerland, 23-27 August 2004
             Invited Speaker: Martin Kay (Stanford University)
              http://members.cox.net/karinem/COLING2004



WORKSHOP THEME 

Recently, there has been a surge of interest in the study of the languages of the 
Middle East, especially Arabic, Persian (Farsi), Pashto, Kurdish and Urdu. The usage 
of the Arabic script gives rise to certain issues that are common to all these 
languages despite their being of distinct language families. Hence, these languages 
share properties such as the absence of capitalization, right to left direction, lack 
of clear word boundaries, complex word structure, a high degree of ambiguity due to 
non-representation of short vowels in the writing system, and related encoding issues. 
Yet the research on these various languages have rarely been brought together in a 
single forum, and most development has been the result of initiatives by individual 
research establishments or industry firms. 

The goal of this workshop is to provide a forum for those involved in the development 
of NLP systems in Arabic script languages to exchange ideas, approaches and 
implementations of computational systems; to discuss the common challenges faced by 
all practitioners; and to assess the state of the art in the field. In addition, one 
of the aims of the workshop is to identify promising areas for future collaborative 
research in the development of NLP systems for Arabic script languages. 


WORKSHOP PROGRAM 

I. Opening and Overview
8:30-9:00 Computer Processing of Arabic Script-based Languages: Current State and 
Future Directions - Ali Farghaly 

II. Session 1: Lexicon and Corpora
9:00-9:30 Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools - 
Mohamed Maamouri and Ann Bies 
9:30-10:00 Preliminary Lexical Framework for English-Arabic Semantic Resource 
Construction - Anne R. Diekema 
10:00-10:30 The Architecture of a Standard Arabic Lexical Database: Some Figures, 
Ratios, and Categories from the DIINAR.1 Source Program - Ramzi Abb�s, Joseph Dichy 
and Mohamed Hassoun 

10:30-10:45 Break

III. Session 2: Morphology
10:45-11:15 Systematic Verb Stem Generation for Arabic - Jim Yaghi and Sane Yagi 
11:15-11:45 Issues in Arabic Orthography and Morphology Analysis - Tim Buckwalter 
11:45-12:15 Finite-State Morphological Analysis of Persian - Karine Megerdoomian 

12:15-2:00 Lunch & Demo Sessions

IV. Demonstrations 
Urdu Localization Project - Sarmad Hussain 
FarsiSum: A Persian Text Summarizer - Martin Hassel and Nima Mazdak 
Stemming the Qur'an - Naglaa Thabet 
Language Weaver Arabic->English MT - Daniel Marcu, Alex Fraser, William Wong and Kevin 
Knight 

V. Invited Speaker  
2:00-2:45 Arabic Script-Based Languages Deserve to be Studied Linguistically - Martin 
Kay 

VI. Session 3: Statistical Approaches
2:45-3:15 An Unsupervised Approach for Bootstrapping Arabic Sense Tagging - Mona T. 
Diab 
3:15-3:45 Automatic Arabic Document Categorization Based on the Naive Bayes Algorithm 
- Mohamed El Kourdi, Amine Bensaid and Tajje-eddine Rachidi 

3:45-4:00 Break

VII. Session 4: Speech Processing 
4:00-4:30 A Transcription Scheme for Languages Employing the Arabic Script Motivated 
by Speech Processing Applications - Shadi Ganjavi, Panayiotis G. Georgiou and 
Shrikanth Narayanan 
4:30-5:00 Automatic Diacritization of Arabic for Acoustic Modeling in Speech 
Recognition - Dimitra Vergyri and Katrin Kirchhoff 
5:00-5:30 Letter-to-Sound Conversion for Urdu Text-to-Speech System - Sarmad Hussain 

VIII. Discussion and Closing
5:30-6:00 Ali Farghaly and Karine Megerdoomian 

Accepted papers and formal demonstrations will be published in a proceedings volume, 
which will be made available at the workshop. 



WORKSHOP REGISTRATION 

For the workshops to take place, the COLING 2004 organizers require at least 20 
participants to register for the workshop. Speakers and participants are therefore 
asked to register via the official Coling 2004 website as soon as possible by visiting 
http://www.issco.unige.ch/coling2004/. 

Workshop fees (in Swiss Francs): 
* Student early chf 90 
* Student late chf 120 
* Student on-site chf 150 
* Regular early chf 120 
* Regular late chf 150 
* Regular on-site chf 180 


ORGANIZING COMMITTEE 
 
Ali Farghaly (SYSTRAN Software, Inc.) 
Karine Megerdoomian (Inxight Software and University of California, San Diego) 


PROGRAM COMMITTEE 

Jan W. Amtrup (Bowne Global Solutions) 
Tim Buckwalter (Linguistic Data Consortium) 
Miriam Butt (Konstanz University, Germany) 
Violetta Cavalli-Sforza (Carnegie Mellon University) 
Joseph Dichy (Lyon University) 
Abdelkadir Fassi Fehri (Mohammed V University-Souissi Rabat, Morocco) 
Andrew Freeman (University of Washington) 
Nizar Habash (University of Maryland, College Park) 
Masayo Iida (Inxight Software, Inc) 
Simin Karimi (University of Arizona) 
Martin Kay (Stanford University) 
Kevin Knight (USC/Information Sciences Institute) 
Farhad Oroumchian (University of Wollongong in Dubai) 
Ahmed Rafea (The American University in Cairo) 
Jean Senellart (SYSTRAN Software) 
Bonnie Glover Stalls (University of Southern California) 
R�mi Zajac (SYSTRAN Software) 



_______________________________________________
MT-List mailing list
[EMAIL PROTECTED]
http://www.computing.dcu.ie/mailman/listinfo/mt-list

Reply via email to