*CoLI-Kanglish: Word Level Language Identification in Code-mixed
Kannada-English Texts*


CoLI-Kanglish shared task@ICON2022*

URL: https://sites.google.com/view/kanglishicon2022/home

Registration link:
https://docs.google.com/forms/d/e/1FAIpQLSfFZR_5ugGKQnf2FYNIWnOh4rv6Bz6podD1YF6gByJ7mhjT4w/viewform


The training and test set is now available.


*Participants are invited to publish Working Notes of ICON 2022**




*Task Description*

The task of automatically identifying languages used in a given text is
called Language Identification (LI). LI is a pre-processing step for many
applications and LI at the word level can be viewed as a sequence labelling
problem where every word in a sentence is tagged with either a mixed
language or one of the languages in the predefined set of languages.
Despite a lot of work being done in LI, the problem of LI in the code-mixed
scenario is still a long way from being illuminated.


India has a rich heritage of languages and Kannada is one of the Dravidian
languages as well as the official language of Karnataka state. People of
Karnataka read, write and speak Kannada but many find it difficult to use
Kannada script to post messages or comments on social media. While
technological limitations like keyboards of computers and smartphones are
one reason, another reason may be the complexity of framing words with
consonant conjuncts. Hence, most of the users use only Roman script or a
combination of both Kannada and Roman script to post comments on social
media. To address word level LI in code-mixed Kannada-English (Kn-En)
texts, these texts are extracted from Kannada YouTube video comments to
construct Code-mixed Language Identification (CoLI-Kenglish) dataset.


We encourage participants to use the CoLI-Kenglish dataset which consists
of English, Kannada and mixed language words, in Roman script and submit
their methods to Kanglish shared task where each word will be identified
and categorized in one of the predefined categories.



*Important Dates*

   - 2nd November – Train and test datasets are released
   - 2nd November – Submission link release
   - 16th November – Run submission deadline
   - 22nd November – Working Note submission deadline
   - 25th November - Reviews Notifications
   - 1st December– Camera Ready Due
   - December 15th - 18th - ICON 2022
   
<https://www.google.com/url?q=https%3A%2F%2Flcs2.in%2FICON-2022%2Findex.html&sa=D&sntz=1&usg=AOvVaw0bQpphhjt6mkQuIEQenw7z>
    Conference


*NOTE:* All dates mentioned here are in the Indian Time zone.


*Organizers*

Fazlourrahman Balouchzahi, Instituto Politecnico Nacional, Mexico

Sabur Butt, Instituto Politecnico Nacional, Mexico

Noman Ashraf, Dana-Farber Cancer Institute, Harvard Medical School, United
States

Asha Hegde, Department of Computer Science, Mangalore University, India

Shashirekha Hosahalli Lakshmaiah, Department of Computer Science, Mangalore
University, India

Grigori Sidorov, Instituto Politecnico Nacional, Mexico

Alexander Gelbukh, Instituto Politecnico Nacional, Mexico


*Contact*

Email: [email protected]

*ICON 2022: https://www.lcs2.in/ICON-2022/index.html
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to