Dear all, I’m pleased to announce the release of AraFinNews, the largest Arabic 
financial-news summarisation dataset to date.
GitHub: https://github.com/ArabicNLP-UK/AraFinNews
HuggingFace: https://huggingface.co/datasets/drelhaj/AraFinNews
Paper: https://arxiv.org/abs/2511.01265
AraFinNews includes 212,500 article–headline pairs (2015–2025) from reputable 
financial media, making it a strong Arabic counterpart to benchmarks like 
CNN/DailyMail. It provides clean structured text and article-level metadata, 
with ready-to-use splits for summarisation and for building downstream 
financial NLP tasks.
Using this dataset, we evaluated mT5, AraT5, and the domain-adapted FinAraT5, 
with results demonstrating clear gains from financial-domain pretraining in 
headline generation.
Best wishes,
Mo


See more of our NLP datasets and tools on:

ArabicNLP.uk https://arabicnlp.uk/

HuggingFace: https://huggingface.co/drelhaj

GitHub: https://github.com/drelhaj

ArabicNLP-UK: https://github.com/orgs/ArabicNLP-UK/

UCREL: https://github.com/UCREL


————


Dr Mo El-Haj
Director of NLP @ VinUniversity
Reader (Associate Professor) in NLP
CECS, VinUniversity, Vietnam
SCC, Lancaster University, UK
https://elhaj.uk<https://elhaj.uk/>
https://vinnlp.com<https://vinnlp.com/>

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to