----------------------------------------------
Invitation to Arabic NER Shared Task 2023
----------------------------------------------

Dear colleagues,

We are happy to invite you to join the Arabic NER SharedTask 2023 which will be 
organized as part of the WANLP 2023. We will provide you with a large corpus 
and Google Colab notebooks to help you reproduce the baseline results.  

دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود 
المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها.    

For more details please visit the shared task website from ( 
https://dlnlp.ai/st/wojood/ ). 
You can register directly from ( 
https://docs.google.com/forms/d/e/1FAIpQLSeWwvGSRMcSa7CHStGLkE8ODY87571wGD2_VKd1t_n5xsGSyg/viewform
 ).

-----------------------------------------
INTRODUCTION
-----------------------------------------
Named Entity Recognition (NER) is integral to many NLP applications. It is the 
task of identifying named entity mentions in unstructured text and classifying 
them to predefined classes such as person, organization, location, or date. Due 
to the scarcity of Arabic resources, most of the research on Arabic NER focuses 
on flat entities and addresses a limited number of entity types (person, 
organization, and location). The goal of this shared task is to alleviate this 
bottleneck by providing Wojood, a large and rich Arabic NER corpus. Wojood 
consists of about 550K tokens (MSA and dialect, in multiple domains) that are 
manually annotated with 21 entity types.

-----------------------------------------
REGISTRATION
-----------------------------------------
Participants need to register via this form 
(https://forms.gle/UCCrVNZ2LaPviCZS6). Participating teams will be provided 
with common training development datasets. No external manually labelled 
datasets are allowed. Blind test data set will be used to evaluate the output 
of the participating teams. Each team is allowed a maximum of 3 submissions. 
All teams are required to report on the development and test sets (after 
results are announced) in their write-ups.

-----------------------------------------
FAQ
-----------------------------------------
For any questions related to this task, please check our Frequently Asked 
Questions 
(https://docs.google.com/document/d/1XE2n89mFLic2P9DO_sAD51vy734BOt0kgtZ6bFfBUW8/edit)

-----------------------------------------
IMPORTANT DATES
-----------------------------------------
Below is subject to change:
         - March 03, 2023: Registration available
         - March 25, 2023: Data-sharing and evaluation on development set 
Avaliable
         - April 10, 2023: Registration deadline
         - May 20, 2023: Test set made available
         - May 30, 2023: Evaluation on test set (TEST) deadline
         - Jun 25, 2023: Shared task system paper submissions due
         - JUL 15, 2023: Notification of acceptance
         - Jul 30, 2023: Camera-ready version
         - TBA, 2023: WANLP 2023 Conference.
         * All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).

-----------------------------------------
CONTACT
-----------------------------------------
For any questions related to this task, please contact the organizers directly 
using the following email address: [email protected] or join the google 
group: https://groups.google.com/g/ner_sharedtask2023.

-----------------------------------------
SHARED TASK
-----------------------------------------
As described, this shared task targets both flat and nested Arabic NER. The 
subtasks are:

Subtask 1: Flat NER
In this subtask, we provide the Wojood-Flat train (70%) and development (10%) 
datasets. The final evaluation will be on the test set (20%). The flat NER 
dataset is the same as the nested NER dataset in terms of train/test/dev split 
and each split contains the same content. The only difference in the flat NER 
is each token is assigned one tag, which is the first high-level tag assigned 
to each token in the nested NER dataset.

Subtask 2: Nestd NER
In this subtask, we provide the Wojood-Nested train (70%) and development (10%) 
datasets. The final evaluation will be on the test set (20%).

-----------------------------------------
METRICS
-----------------------------------------
The evaluation metrics will include precision, recall, F1-score. However, our 
official metric will be the micro F1-score.

The evaluation of shared tasks will be hosted through CODALAB. Teams will be 
provided with a CODALAB link for each shared task.

         -CODALAB link for NER Shared Task Subtask 1 (Flat NER)
         -CODALAB link for NER Shared Task Subtask 2 (Nestd NER)

-----------------------------------------
BASELINES
-----------------------------------------
Two baseline models trained on Wojood (flat and nested) are provided:

Nested NER baseline: is presented in this article, and code is available in 
GitHub. The model achieves a micro F1-score of 0.9059 (note that this baseline 
does not handle nested entities of the same type).

Flat NER baseline: same code repository for nested NER (GitHub) can also be 
used to train flat NER task. Our flat NER baseline achieved a micro F1-score of 
0.8785.

-----------------------------------------
GOOGLE COLAB NOTEBOOKS
-----------------------------------------
To allow you to experiment with the baseline, we authored four Google Colab 
notebooks that demonstrate how to train and evaluate our baseline models.
[1] Train Flat NER: This notebook can be used to train our ArabicNER model on 
the flat NER task using the sample Wojood data found in our repository.
[2] Evaluate Flat NER: this notebook will use the trained model saved from the 
notebook above to perform evaluation on unseen dataset.
[3] Train Nested NER: This notebook can be used to train our ArabicNER model on 
the nested NER task using the sample Wojood data found in our repository.
[4] Evaluate Nested NER: this notebook will use the trained model saved from 
the notebook above to perform evaluation on unseen dataset.


-----------------------------------------
ORGANIZERS
-----------------------------------------
         - Mustafa Jarrar, Birzeit University
         - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
         - Mohammed Khalilia, Birzeit University
         - Bashar Talafha, University of British Columbia
         - AbdelRahim Elmadany, University of British Columbia
         - Nagham Hamad, Birzeit University
         - Alaa Omer, Birzeit University
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to