OSACT 2026 Workshop, Third Call for Papers
11 May 2026, Palma de Mallorca, Spain
https://osact-lrec.github.io

Hosted by LREC 2026
https://lrec2026.info/

Workshop Description
The Open-Source Arabic Corpora and Processing Tools (OSACT) workshop series 
provides a forum for researchers, practitioners, and students in computational 
linguistics (CL), natural language processing (NLP), and information retrieval 
(IR) to share and discuss ongoing work on Arabic language resources and 
technologies. While Arabic remains comparatively resource-poor in relation to 
English, recent years have seen the emergence of large, freely available 
classical and Modern Standard Arabic (MSA) corpora, as well as dialectical 
corpora and processing tools.
Now in its seventh edition, OSACT7 takes an important step forward by 
celebrating this milestone with seven shared tasks, each addressing timely 
challenges in Arabic NLP and reflecting broader themes relevant to NLP research 
in general. OSACT7 builds on its long-standing commitment to open-source 
contributions that advance accessibility, reproducibility, and fairness, and 
this year it places inclusivity at the heart of its mission. A key focus is to 
recognize and support minority dialects and underrepresented varieties of 
Arabic, ensuring that diverse linguistic voices and resources are not only 
acknowledged but actively valued within the community.
The workshop will cover general topics in CL, NLP, and IR, with special 
emphasis on Large Language Models (LLMs) and Generative AI, including 
pre-trained Arabic language models, corpus design and evaluation, and annotated 
corpora for tasks such as named entity recognition, machine translation, 
sentiment analysis, and text classification. Additional areas of focus include 
crowdsourcing for data annotation, tools for language education, tokenization, 
normalisation, morphological analysis, part-of-speech tagging, dialect 
identification and translation, fake news detection, and web and social media 
analytics. Methodologies for resource creation and annotation, knowledge 
extraction, ontologies, terminology, knowledge representation, and integration 
with the Semantic Web (e.g. Linked Data, Knowledge Graphs) will also be 
explored.

Workshop Topics
The workshop welcomes (including but not limited to) topics in the following 
areas:
A) Language Resources:
·         Pre-trained Arabic language models.
·         Surveys and evaluations of existing Arabic corpora and their 
associated processing tools.
·         Development and release of new annotated corpora for NLP and IR tasks 
such as named entity recognition, machine translation, sentiment analysis, text 
classification, and language learning.
·         Assessing the effectiveness of crowdsourcing platforms for Arabic 
data annotation.
·         Arabic text and speech processing toolkits.

B) Tools and Technologies:
·         Language education, including first (L1) and second (L2) language 
learning applications.
·         Pre-training & fine-tuning approaches for Arabic.
·         Tokenization, normalisation, segmentation, morphology, and POS 
tagging.
·         Sentiment analysis, dialect ID, \& classification.
·         Web and social media analytics.
·         Arabic LRs for text, speech, sign, gesture, image, & multimodal data.
·         Best practices for LR interoperability.
·         Construction and annotation of LRs.
·         Knowledge extraction, acquisition, and representation.
·         Ontologies, terminology, and frameworks.
·         LRs and the Semantic Web (Linked Data, Knowledge Graphs).
·         Data contamination, synthetic data, and quality issues.

Important Dates
·         Feb 25 Feb 18, 2026: Paper submission deadline
·         March 23, 2026 Notification of acceptance
·         March 30, 2026: Camera-ready deadline
·         May 11, 2026: Workshop Date

Submission Instructions
We invite submissions on topics of interest between 4 and 8 pages of
content. The page limit of 8 pages does not include acknowledgements,
references, potential Ethics Statements and discussion on Limitations in
line with the policy of the main LREC conference. All submissions must
follow the LREC stylesheet (https://lrec2026.info/authors-kit/).

All submissions are double-blind. Any submissions which are
not-anonymised, over-length, poorly formatted or make excessive use of
appendices to circumvent page limits are liable to desk-rejection.

At the time of submission, authors are offered the opportunity to share
related language resources with the community. All repository entries
are linked to the LRE Map (https://lremap.elra.info/), which provides
metadata for the resource.


Organizing Committee

  *   Hend Al-Khalifa, Professor, King Saud University, Riyadh, Saudi Arabia, 
[email protected]<mailto:[email protected]>
  *   Mo El-Haj, Reader, VinUniversity, Vietnam, Lancaster University, UK, 
[email protected]<mailto:[email protected]>
  *   Saad Ezzini, Assistant Professor, King Fahd University of Petroleum and 
Minerals (KFUPM), Saudi Arabia, 
[email protected]<mailto:[email protected]>


————

Dr. Saad Ezzini
Assistant Professor, King Fahd University of Petroleum and Minerals (KFUPM)
Saudi Arabia
ezzini.me<http://ezzini.me>

**********************************************************************
DISCLAIMER: The information in this email and its attachments (if any) is 
intended for the addressee only and may contain confidential or privileged 
information. If you are not the intended recipient, please delete the email and 
its attachments from your system and notify the sender immediately. You should 
not retain, disclose, copy, or use this email or any of its contents for any 
purpose, nor disclose its contents to any other person. KFUPM is not 
responsible for changes made to this message after it was sent. Statements and 
opinions expressed in this e-mail are those of the sender, and do not 
necessarily reflect those of KFUPM. KFUPM is not liable for any effect or virus 
damage caused by this message.
إن المعلومات الواردة في هذا البريد الإلكتروني ومرفقاته إن وجدت، قد تكون خاصة أو 
سرية؛ فإذا لم تكن المقصود بهذه الرسالة؛ فيُرجى منك حذفها ومرفقاتها من نظامك 
وإخطار المرسل بخطأ وصولها إليك فورا. كما لا يجوز نسخ أي جزء منها أو مرفقاتها ، 
أو الإفصاح عن محتوياتها لأي شخص أو استعمالها لأي غرض آخر. إن جامعة الملك فهد 
للبترول والمعادن لا تتحمل مسؤولية التغييرات التي يتم إجراؤها على هذه الرسالة 
بعد إرسالها. وإن البيانات أو الآراء المعبر عنها في هذا البريد، هي بيانات تخص 
مُرسلها، ولا تعكس بالضرورة رأي وبيانات الجامعة. كما لا تتحمل الجامعة مسؤولية أي 
تأثير ينتج عن هذه الرسالة أوعن أي فيروس قد تحمله.
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to