[Corpora-List] HASOC 2023 tasks - Call for Participation - Hate Speech and Offensive Content Identification

Thomas Mandl via Corpora Wed, 26 Jul 2023 01:57:32 -0700

15th meeting of /Forum for Information Retrieval Evaluation* HASOC-2023*/


We are excited to announce the 5th edition of HASOC, consisting of four
interesting shared tasks. We invite you to participate.

*Task 1 focus on identifying hate speech, offensive language, and
profanity in different languages using natural language processing
techniques.*

 * Task 1A deals with identifying hate and offensive content in
   Sinhala, a low-resource Indo-Aryan language spoken in Sri Lanka. The
   task involves classifying tweets into Hate and Offensive (HOF) or
   Non-Hate and Offensive (NOT). The dataset for this task is based on
   the Sinhala Offensive Language Detection dataset.
 * Task 1B focuses on identifying hate and offensive content in
   Gujarati, another low-resource Indo-Aryan language spoken by
   approximately 50 million people in India. Similarly, participants
   need to classify tweets into HOF or NOT categories. The training set
   for this task consists of around 200 tweets.

For more details, please visit task 1 page
<https://hasocfire.github.io/hasoc/2023/task1.html>.

*Task 2, Identification of Conversational Hate-Speech in Code-Mixed
Languages (ICHCL), addresses the challenge of identifying hate speech
and offensive content in code-mixed conversations on social media.
Code-mixed text includes multiple languages within a single
conversation. The task is divided into two subtasks.*

 * In Task 2a, participants need to perform binary classification on
   conversational tweets with tree-structured data. They must determine
   whether a tweet, comment, or reply contains hate speech, offensive
   language, or profanity (HOF) or is non-hate and offensive (NOT). The
   classification should consider both the individual content and
   support for hate expressed in the parent tweet.
 * Task 2b involves the classification of conversational tweets with
   tree-structured data into specific forms of hate. Participants must
   identify if the tweet, comment, or reply contains standalone hate
   (SHOF), contextual hate (CHOF) that supports hate expressed in the
   parent, or if it is non-hate (NONE).

For more details, please visit Task 2 webpage.
<https://hasocfire.github.io/hasoc/2023/ichcl.html>

*Task 3 aims to detect hateful spans within a sentence already
considered hateful. A hate span is a set of continuous tokens that, in
tandem, communicate the explicit hatefulness in a sentence.*

 * For instance, in the statement, "Women ... Can't live with them...
   Can't shoot them," the portion highlighted in bold will be
   considered a hateful span. This shared task aims to extract all such
   spans from a hateful text.
 * The input texts are all in English. The detection of hateful spans
   is achieved by mapping this into a sequence labeling problem. For
   every token of the sequences, we have manually annotated the start
   and end of a hateful span. This is achieved by the BIO notation
   tagging, where B' represents the beginning of the hate span,' I'
   forms the continuation of a hate span, and' O' represents the
   non-hate tag. The task is then to learn the correct sequence of the
   BIO tags for a given sentence. For example, in the above sentence,
   the tag sequence for the preprocessed sentence will be of the form
   "women can't live with them can't shoot them" → "O O O O O B I I";
   "I" notation cannot exist on its own and will always be preceded by
   either an "I" or "B". Consequently, a “B” notation can be
   immediately followed by an “O” in case the span is just a single word.

For more details, please visit Task 3 webpage.
<https://lcs2.in/hatenorm-2023/>

*Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese
languages. It is a binary classification task. Each dataset (for the
three languages) consists of a list of sentences with their
corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is
primarily collected from Twitter, Facebook, and Youtube comments.
*
The Macro F1 score will be the yardstick of the task. Team rank will be
determined based on the Macro F1 score of the first part.

For more details, please visit Task 4 webpage.
<https://sites.google.com/view/hasoc-2023-annihilate-hates/home>

Registration for all four tasks is open on our registration page.
<https://hasocfire.github.io/hasoc/2023/registration.html>

We believe that your expertise and contribution will be invaluable in
advancing the state-of-the-art hate speech classification. We encourage
you to participate in this exciting shared task and contribute to the
research community.

Regards,
HASOC organizing team

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] HASOC 2023 tasks - Call for Participation - Hate Speech and Offensive Content Identification

Reply via email to