[Corpora-List] CFP: The 1st Workshop on NLP for Languages Using Arabic Script (AbjadNLP 2025)

amalhaddad via Corpora Sat, 10 Aug 2024 10:31:12 -0700

The 1st Workshop on NLP for Languages Using Arabic Script


(AbjadNLP 2025)

Abu Dhabi, UAE

19-20 January 2024

Submission URL: https://softconf.com/coling2025/AbjadNLP25/

Co-located with COLING 2025 Conference, Abu Dhabi, UAE (19-20 January2025)

AbjadNLP is dedicated to advancing innovation and gaining deeperinsights into Natural Language Processing (NLP) for languages that usethe Arabic script. Our primary focus is on Abjad and Ajami languagesthat utilise the Arabic script or its variations. Traditionallyassociated with Semitic languages, Abjad scripts represent consonants inevery syllable. In contrast, Ajami scripts denote the alphabetic use ofthe Arabic script in various African contexts, representing non-Arabiclanguages. We are interested in research on languages that fall underthe Abjad or Ajami categories that use the Arabic script or anyvariations of it.

We invite contributions, discussions, and explorations that delve deepinto the unique linguistic structures, resources, challenges, anduntapped potential presented by Abjad and Ajami languages within therealm of NLP and language resources. Our goal is to create synergiesamong researchers by addressing the diverse phenomena and challengesinherent in these rich linguistic traditions.

The workshop is proud to highlight our connections with the MasakhaneNLP community and collaborations with institutions worldwide, such asCOMSATS on Urdu, and the long-standing UCREL NLP Group at LancasterUniversity, whose work encompasses over 20 languages worldwide,including Abjad and Ajami languages.

Note: We chose the name Abjad for simplicity, but our focus includesAbjad and other languages that have adopted the Arabic and Perso-Arabicscripts, as well as Ajami languages. We acknowledge that Sorani Kurdish,when written in Arabic script, follows an alphabet style rather than anAbjad style.


Workshop Description:

We welcome contributions, discussions, and explorations that thoroughlyinvestigate the distinctive linguistic structures, resources,challenges, and untapped potential of Abjad and Ajami languages withinthe field of NLP and language resources. Our aim is to fostercollaboration among researchers by addressing the varied phenomena andchallenges inherent in these rich linguistic traditions.

Ajami languages, representing a myriad of African languages that haveadopted the Arabic script, span at least 43 distinct languages,including Hausa, Fulfulde, Mandingo, Swahili, Wolof, Kanuri, andTamazight. The combined number of speakers of these languages isestimated to exceed 200 million within Africa alone. Although Abjad hasbeen traditionally associated with Semitic languages such as Arabic,Hebrew and Syriac, it has been adopted for writing by many otherlanguage communities as in Perso-Arabic scripts used in Persian, Urdu,Pashto, Sorani Kurdish, Azeri Turkish, Sindhi, and Uyghur, with acollective estimated speaker population exceeding 500 million.Altogether, these languages represent an approximate global aggregate of1 billion speakers.The adoption of the Arabic script across diverse linguistic landscapeshighlights its expansive and varied application, transcending genressuch as governmental correspondences, poetic compositions, religioustexts, and journalistic pursuits. This widespread use underscores theimperative need to enhance digital infrastructure, tools, and resourcesfor these under-resourced languages. Advancing such resources is crucialto nurturing linguistic diversity and resilience in both digital andprint media, ensuring the preservation of linguistic heritage in thedigital age.Currently, there is an increasing interest in various NLP communities,both in academia and industry, in writing systems. However, there is alack of initiatives focusing on the diverse phenomena and challenges ofthe languages using an Abjad script. The AbjadNLP workshop aims to fillthis gap, fostering collaboration and innovation in this vital area ofstudy.


Motivation

Languages employing an Abjad script signify a pivotal and diversefragment of the global linguistic mosaic, traversing numerous countriesand regions and embodying a considerable populace of speakers. Thelinguistic wealth and geographical diffusion of languages covered byAbjadNLP present a prolific environment for exploration and advancementin NLP. By channeling attention towards these languages, the realm ofNLP is poised to unlock access to an expansive and varied array oflinguistic constructions, subtleties, and cultural contexts, pivotal forbolstering the versatility and adaptability of NLP models andapplications. The extensive spectrum of these languages not only unfoldsa valuable opportunity to amplify multilingualism and multiculturalismin NLP research but also forges pathways for addressing the requisitesand challenges intrinsic to a diverse and extensive speaker population.The broad adoption of Abjad scripts transcends diverse genres, includinggovernmental correspondences, poetic compositions, religious texts, andjournalistic pursuits. The sustained use of such scripts underscores theimperative need to enhance digital infrastructure, tools, and resourcesthat elucidate the varied writing systems inherent to under-resourcedlanguages. Such advancement is crucial to nurturing linguistic diversityand resilience in both digital and print media, ensuring that thelinguistic heritage does not diminish in the digital age.This workshop can contribute to more inclusive and equitableprogressions in NLP, accommodating a broader assortment of languages anddialects and promoting enhanced comprehension and interconnectivityamongst varied linguistic communities. The assimilation andprioritization of these linguistically affluent and diverse languagesare indispensable for the comprehensive progression and the universaladaptability of NLP technologies. While our workshop primarily targetslanguages using an Abjad script, we recognize that many historicallanguages such as Aramaic , Sogdian, Parthian and Phoenician employedsuch a writing system. As such, we believe that our workshop can enforcelinks with researchers working on endangered languages as well.

We are proud to highlight our existing connections with the MasakhaneNLP community (www.masakhane.io) and collaborations with institutionsworldwide, such as COMSAT on Urdu (www.comsats.edu.pk), and thelong-standing UCREL NLP Group at Lancaster University, whose workencompasses over 20 languages worldwide, including Abjad and Ajamilanguages (http://ucrel-web-dev.lancs.ac.uk/ucrelng/).


Team

Our team is uniquely diverse and gender-balanced, comprising individualsfrom a wide range of ethnic backgrounds. We represent a spectrum oflanguages that use the Arabic script and include researchers from bothLinguistics and NLP, enriching the ever-needed collaboration betweenthese two fields. With expertise in language technology, Unicode, NLP,resources, and multilingual text analysis, together, we aim to foster adynamic and inclusive environment for research and collaboration in thefield of NLP.


Call for papers

We invite submissions on topics that include, but are not limited to,the following:* Enabling core technologies: morphological analysis, disambiguation,tokenisation, POS tagging, named entity detection, chunking, parsing,semantic role labelling, sentiment analysis, language modelling, etc.* Applications: machine translation, speech recognition, speechsynthesis, optical character recognition, pedagogy, assistivetechnologies, social media, etc.

* Resources: dictionaries, annotated data, corpus, etc.

In addition, we extend a warm invitation to researchers and stakeholdersacross the spectrum to contribute papers focusing on, but not limitedto, the following dimensions:


        * Orthography descriptions (Constable 2002; Hosken 2003)
        * Advancements in Font Technology, Glyph Rendering, and OCR
        * Text Input Methodologies
        * Development and Utilisation of Exploitable Dictionaries
        * Enhancements in Spell-Checking Support
        * Advancements in Speech-to-Text Solutions
        * Progressive Natural Language Processing Techniques

* BLARK - Basic Language Resource Kit descriptions for languages usingAbjad or Ajami* Insights and Experiences Utilising Data Supplied by the UnicodeHosted Common Locale Data Repository in Abjad or Ajami.* Morphological and syntactical challenges in Abjad or AjamiOrthographies.

        * Development of open access corpora in Abjad or Ajami.

* Text processing and transliteration challenges and solutions forlanguages using Abjad or Ajami.* Cultural and sociolinguistic considerations in NLP applications forAbjad or Ajami.

        * Languages resources and NLP tools for Abjad or Ajami.

Summary of the Call:

We welcome submissions of papers centred around the Abjad and Ajamitheme, focusing on supporting NLP language resources for non-Arabiclanguages utilising Arabic script. We encourage submissions that span aspectrum from theoretical investigations to practical applications,aiming to underscore the distinctive challenges, solutions, and insightsthat languages using Ajami and Abjad scripts introduce to the field ofNLP.

For the submission format and guidelines, we follow the COLING 2025standards. Authors are encouraged to thoroughly review and adhere to theCOLING 2025 submission guidelines and author kit, which can be found at:https://coling2025.org/calls/submission_guidlines/.

If authors are describing an orthography, we request that they includethe points recommended in (Hosken 2003https://scripts.sil.org/WP-Encoding). For continuity across the workshopand greater impact across industry applications, authors should considerterminological (orthography, script, writing system, etc.) differencespresented by Constable (2002)https://www.sil.org/resources/publications/entry/7853. The modelpresented by Constable is the current Unicode model.

Please ensure that all submissions strictly conform to these standardsto streamline the review process and maintain uniformity across allcontributions. Both long papers (up to 8 pages) and short papers (up to4 pages) are welcome. All submissions will undergo a rigorouspeer-review process, emphasizing originality, relevance, and clarity.


Submissions may be of two types:

* Long papers - up to eight (8) pages maximum, presenting substantial,original, completed, and unpublished work.* Short papers - up to four (4) pages, describing a small focusedcontribution, negative results, system demonstrations, etc.


Submission URL: https://softconf.com/coling2025/AbjadNLP25/

Submission Guidelines:https://coling2025.org/calls/submission_guidlines/


Provisional Key Dates:

        * 1st Call for Papers Announcement: 16 July 2024
        * 2nd Call for Papers Announcement: 16 August 2024
        * Paper Submission Deadline: 15 November 2024
        * Notification of Paper Acceptance: 6 December 2024
        * Camera-ready Paper Deadline: 13 December 2024
        * Workshop Date: either on 19 or 20 January 2024

Anti-Harassment Policy:

The workshop supports the COLING anti-harassment policyhttps://coling2022.org/policy


Organising Committee:

General Chair:

* Dr. Mo El-Haj, Senior Lecturer at Lancaster University, is a NaturalLanguage Processing expert with a focus on Arabic and under-resourcedlanguages. He founded the FNP workshop series in 2018 and has organisedworkshops at top NLP conferences including LREC and COLING.http://elhaj.uk/ [1]


Programme Chairs:

* Mr Hugh Paterson III. Collaborative Scholar in linguistics, writingsystems, metadata, and archives. http://hugh4.us [2]* Dr Saad Ezzini. Lecturer at Lancaster University, UK. Saad hasexperience working on low-resource languages with a focus on machinetranslation, QA, IR, and language modelling. http://ezzini.github.io [3]* Dr Ignatius Ezeani. Senior Research Associate working onmultilingual NLP, Lancaster University, UKhttps://www.lancaster.ac.uk/scc/about-us/people/ignatius-ezeani [4]


Review Committee:

* Dr Manum Hayat Khan. Cognitive Linguistics Researcher at theUniversity of La Rioja, Spain.https://investigacion.unirioja.es/investigadores/1183/detalle* Dr Muhammad Sharjeel. Assistant Professor working on Urdu NLP,COMSATS University Islamabad, Pakistanhttps://scholar.google.com/citations?user=xUF3l9gAAAAJ&hl=en [5]


Publication Chair:

* Dr Sina Ahmadi. Postdoctoral researcher at University of Zurichfocusing on leveraging language technology to assist languages withconstrained digital resources, with an emphasis on adapting currentnatural language processing approaches and existing resources forless-resourced languages. https://sinaahmadi.github.io/


Publicity Chairs:

* Ms Cynthia Amol. NLP Researcher focusing on low-resource languagesat Maseno University, Kenya.https://ke.linkedin.com/in/cynthia-amol-779668119* Ms Amal Haddad Haddad. PhD Student in translations and terminologiesat the University of Granada, Spain. http://lexicon.ugr.es/haddad* Dr Jaleh Delfani. Research Fellow in Translation at University ofSurrey https://www.surrey.ac.uk/people/jaleh-delfani [6]


Advisory Committee:

* Prof. Ruslan Mitkov, Professor of Computing and Communications atLancaster University, actively working on different research topics fromthe areas of Natural Language Processing (NLP), ComputationalLinguistics and Translation Technology.https://wp.lancs.ac.uk/mitkov/[7]* Prof. Paul Rayson, Director of UCREL research centre at LancasterUniversity, specialises in semantic-based NLP across 20 languages,including Urdu and Arabic. With 25 years of experience, he excels innoisy language environments like financial disclosures and has organisedvarious conferences and workshops.https://www.lancaster.ac.uk/staff/rayson/ [8]


Programme Committee*

        * Abdoulaye Diallo. Fula & Wolof. Independent Researcher

* Ahmed Abdelali. Arabic/Multilingual NLP. Qatar Computing ResearchInstitute (QCRI), Qatar.

        * Ahmed AbuRa'ed. Arabic. UBC. Canada.
        * Alp Oktem. Tigrinya and Kanuri. Translators without Borders
        * Antonio Moreno Sandoval. Low Resourced Languages. UAM. Spain
        * Azizud Din. Pashto. University Malaysia Sarawak. Malaysia
        * Behnam Sabeti. Persian. Miras Technologies International. Iran
        * Chenggang Mi. Uyghur. Xinjiang Technical Institute. China
        * Clement Oyeleke. Yoruba. University of Ibadan. Nigeria
        * Daniel Whitenack. Kimbundu, Fulfulde, Pular. SIL International. USA
        * Derguene Mbaye. Wolof. Baamtu. Senegal
        * Djamel Mostefa. Pashto. ELDA, France.
        * Doaa Samy. Arabic. Cairo University, Egypt and LLI-UAM. Spain
        * Elias W BA. Fula and Wolof. Baamtu. Senegal
        * Eric Atwell. Arabic/Multilingual NLP. Leeds University, UK.
        * Frederick Apina. Swahili. Parrot.AI. Tanzania

* George Giannakopoulos. Multilingual NLP. SKEL Lab - NCSR Demokritos.Greece* Haithem Afli. Arabic/Multilingual NLP. Dublin City University,Ireland.* Hazem Hajj. Arabic/Multilingual NLP. American University of Beirut,Lebanon.

        * Houda Bouamor. Arabic/Multilingual NLP. CMU. Qatar

* Ignatius Ezeani. Igbo, African Languages NLP. Lancaster University,UK.

        * Imed Zitouni. Arabic/Multilingual NLP. Microsoft Research, USA.

* Karim Bouzoubaa. Arabic/Multilingual NLP. Mohamed Vth University,Morocco.* Mariam Masoud. Sorani Kurdish. National University of IrelandGalway. Ireland

        * Lei Wang. Uyghur. Xinjiang Technical Institute. China

* Marina Litvak. Hebrew and Arabic. Sami Shamoon College ofEngineering, Israel* Mo El-Haj. Arabic/Multilingual and Low resourced Languages.Lancaster University, UK

        * Muhammad Sharjeel. Urdu. COMSATS University Islamabad, Pakistan.
        * Omid Momenzadeh. Persian. Miras Technologies International. Iran

* Paul Rayson. Multilingual and Low resourced Languages. LancasterUniversity, UK

        * Preni Golazizian. Persian. Miras Technologies International. Iran

* Rao Muhammad Adeel Nawab. Urdu. COMSATS University Islamabad,Pakistan.

        * Reza Fahmi. Persian. Miras Technologies International. Iran

* Samuel Olanrewaju. Yoruba, Yagba and Basa. University of Ibadan.Nigeria* Scott Piao. Multilingual and Low resourced Languages. LancasterUniversity, UK* Seyed Arad Ashrafi Asli. Persian. Miras Technologies International.Iran

        * Shervin Malmasi. Sorani Kurdish. Macquarie University. Australia

* Sina Ahmadi. Sorani Kurdish. National University of Ireland Galway.Ireland

        * Sokhar Samb. Wolof. ML & NLP. Senegal
        * Tonghai Jiang. Uyghur. Xinjiang Technical Institute. China
        * Waziri Shebogholo. Swahili. Parrot.AI. Tanzania

* Wole Akin. IsiXhosa, Yorùbá, Hausa, and Igbo. University ofJohannesburg. South Africa

        * Xi Zhou. Uyghur. Xinjiang Technical Institute. China
        * Yating Yang. Uyghur. Xinjiang Technical Institute. China
        * Zahra Majdabadi. Persian. Miras Technologies International. Iran

_*We are in the process of forming a linguistically diverse programcommittee who are experts in languages that use Arabic Script (Abjad andAjami), with the majority of the list already confirmed to serve asreviewers. As soon as we gain access to SoftConf, we will extendinvitations to the remaining committee (if you see your name on the listand want it removed, please contact any of the organisers). If your nameappears in this list and you want it removed, please contact us as soonas possible and we'll make sure it's removed. Thanks_




Links:
------
[1] http://elhaj.uk/
[2] http://hugh4.us/
[3] http://ezzini.github.io/
[4] https://www.lancaster.ac.uk/scc/about-us/people/ignatius-ezeani
[5] https://scholar.google.com/citations?user=xUF3l9gAAAAJ&hl=en
[6] https://www.surrey.ac.uk/people/jaleh-delfani
[7] https://wp.lancs.ac.uk/mitkov/
[8] https://www.lancaster.ac.uk/staff/rayson/

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] CFP: The 1st Workshop on NLP for Languages Using Arabic Script (AbjadNLP 2025)

Reply via email to