We are proud to announce the release of CreoleVal - a collection of benchmarks 
for 28 Creole languages. The collection of datasets span tasks such as relation 
classification, machine comprehension, machine translation, named entity 
recognition, and use cases such as language modeling. We cover Haitian Creole, 
Bislama, Chavacano, Pitkern, Singlish, Tok Pisin, Papiamento, and others.

We hope the NLP community will include this collection of datasets in ongoing & 
future evaluations of methods directed at low-resource languages. Not only 
that, we also hypothesise that CreoleVal will open the door for controlled 
experimentation with transfer learning methodology.

This resource has been long in the making, and was made possible by a long list 
of collaborators.

For a pre-print, see: https://arxiv.org/abs/2310.19567

For code and data, see: https://github.com/hclent/CreoleVal
(Repository under construction)

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to