WojoodNER 2024
The 2nd Arabic Named Entity Recognition Shared Task at ArabicNLP’24

https://dlnlp.ai/st/wojood/
 
ندعوكم للمشاركة في المسابقة العلمية الثانية لاكتشاف الاعلام في النصوص العربية.  
سيحصل المشاركين على مدونة وجود الجديدة (٥٥٠ الف كلمة + انواع مفصلة من الاعلام). 
يوجد ثلاث مهام في المسابقة يمكن المشاركة باي منها، احدى المهام حول الحرب على 
غزة ويمكن للمشاركين استخدام بيانات خارجية  فيها

Dataset: Wojood-Fine <https://aclanthology.org/2023.arabicnlp-1.25/> New 
version: Arabic Fine-Grained Entity Recognition (Wojood + Subtypes of entity 
types).

Subtask-1 (Closed-Track Flat Fine-Grain NER): We provide the Wojood-Fine Flat 
train (70%) and development (10%) datasets. The final evaluation will be on the 
test set (20%). External data is not allowed  .... (read more 
<https://dlnlp.ai/st/wojood/>).

Subtask-2 (Closed-Track Nested Fine-Grain NER): This subtask is similar to the 
subtask-1, we provide the Wojood-Fine Nested train (70%) and development (10%) 
datasets. The final evaluation will be on the test set (20%) .... (read more 
<https://dlnlp.ai/st/wojood/>).

Subtask-3 (Open-Track NER - Gaza War): to allow participants to reflect on the 
utility of NER in the context of real-world events, allow them to use external 
resources, and encourage them to use generative models in different ways 
(fine-tuned, zero-shot learning, in-context learning, etc.). The goal of 
focusing on generative models in this particular subtask is to help the Arabic 
NLP research community better understand the capabilities and performance gaps 
of LLMs in information extraction, an area currently understudied.
We provide development and test data related to the current War on Gaza. This 
is motivated by the assumption that discourse about recent global events will 
involve mentions from different data distribution. For this subtask, we include 
data from five different news domains related to the War on Gaza - but we keep 
the names of the domains hidden. Participants will be given a development 
dataset (10K tokens, 2K from each of the five domains), and a testing dataset 
(50K tokens, 10K from each domain). Both development and testing sets are 
manually annotated with fine-grain named entities using the same annotation 
guidelines used in Subtask1 and Subtask2 (also described in Liqreina et al., 
2023).  .... (read more <https://dlnlp.ai/st/wojood/>).


BASELINES

Two baseline models trained on WojoodFine (flat and nested) are provided (See 
Liqreina et al., 2023 <https://aclanthology.org/2023.arabicnlp-1.25/>). The 
code used to produce these baselines is available on GitHub 
<https://github.com/SinaLab/ArabicNER>.

Subtask Precision       Recall  Average Micro-F1
Flat Fine-Grain NER (Subtask 1) 0.8870  0.8966  0.8917
Nested Fine-Grain NER (Subtask 2)       0.9179  0.9279  0.9229
GOOGLE COLAB NOTEBOOKS

To allow you to experiment with the baseline, we authored four Google Colab 
notebooks that demonstrate how to train and evaluate our baseline models.
[1] Train Flat Fine-Grain NER 
<https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>: 
This notebook can be used to train our ArabicNER model on the flat Fine-grain 
NER task using the sample Wojood_Fine data.
[2] Evaluate Flat Fine-Grain NER 
<https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>: 
This notebook will use the trained model saved from the notebook above to 
perform evaluation on unseen dataset.
[3] Train Nested Fine-Grain NER 
<https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>: 
This notebook can be used to train our ArabicNER model on the nested Fine-grain 
task using the sample Wojood data.
[4] Evaluate Nested Fine-Grain NER 
<https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>: 
This notebook will use the trained model saved from the notebook above to 
perform evaluation on unseen dataset.

REGISTRATION

Participants need to register via this form (NERSharedTask 2024) 
<https://docs.google.com/forms/d/1ISMILgQYfUug3XuDpxFmuPASXkWaduYOUc3xOZuGwqU/edit?ts=65a82a3a>.
 Participating teams will be provided with common training development 
datasets. No external manually labelled datasets are allowed. Blind test data 
set will be used to evaluate the output of the participating teams. Each team 
is allowed a maximum of 3 submissions. All teams are required to report on the 
development and test sets (after results are announced) in their write-ups.

FAQ

For any questions related to this task, please check our Frequently Asked 
Questions 
<https://docs.google.com/document/d/1W_13FRpP3NbDx_ALYJWA3-ESXPRVomOjNovUuYfdmI0/edit?usp=sharing>
IMPORTANT DATES

- February 25, 2024: Shared task announcement.
- March 1, 2024: Release of training data, development sets, scoring script, 
and Codalab links.
- April 5, 2024: Registration deadline.
- April 26, 2024: Test set made available.
- May 3, 2024: Codalab Test system submission deadline.
- May 10, 2024: Shared task system paper submissions due.
- June 17, 2024: Notification of acceptance.
- July 1, 2024: Camera-ready version.
- August 16, 2024: ArabicNLP 2024 conference in Thailand.

CONTACT

For any questions related to this task, please contact the organizers directly 
using the following email address: [email protected] 
<mailto:[email protected]> .

ORGANIZERS

         - Mustafa Jarrar, Birzeit University
         - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
         - Mohammed Khalilia, Birzeit University
         - Bashar Talafha, University of British Columbia
         - AbdelRahim Elmadany, University of British Columbia
         - Nagham Hamad, Birzeit University







--Mustafa
__________________________
Mustafa Jarrar, PhD
Professor of Artificial Intelligence
Chair, PhD Program in Computer Science
Birzeit University, Palestine 
Whatsapp:+972599662258  |  [email protected] <mailto:[email protected]>
http://www.jarrar.info <http://www.jarrar.info/>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to