15th meeting of Forum for Information Retrieval Evaluation HASOC-2023

We are excited to announce the 5th edition of HASOC, consisting of four 
interesting shared tasks. We invite you to participate.

Task 1 focuses on identifying hate speech, offensive language, and profanity in 
different languages using natural language processing techniques.

  *   Task 1A is identifying hate and offensive content in Sinhala, a 
low-resource Indo-Aryan language spoken mainly in Sri Lanka. The task involves 
classifying tweets into Hate and Offensive (HOF) or Non-Hate and Offensive 
(NOT). The training set for this task is based on the Sinhala Offensive 
Language Detection dataset, which contains 10,000 tweets.
  *   Task 1B focuses on identifying hate and offensive content in Gujarati, 
another low-resource Indo-Aryan language spoken by approximately 50 million 
people in India. Similarly, participants need to classify tweets into HOF or 
NOT categories. The training set for this task consists of around 200 tweets.

For more details, please visit task 1 
page<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacSZ3yixCb7bte6oxpFh8nz4kSfS5ewxlE7WYAkRehPzWIcUZf1XqGkI-ce158m10c0RlXh64IsEU0V2sv---Jl8N-wm83Xo23rDv7qMC8_ZgqYbxCb8gQzVK9Sa6WKObIiv3KOpvI8YWCyEMPXjsYoW0NBprurYUp&s=nORFqXBridY67j2VwRiCnAQ3DKQ>.

Task 2, Identification of Conversational Hate-Speech in Code-Mixed Languages 
(ICHCL), addresses the challenge of identifying hate speech and offensive 
content in code-mixed conversations on social media. Code-mixed text includes 
multiple languages within a single conversation. The task is divided into two 
subtasks.

  *   In Task 2a, participants need to perform binary classification on 
conversational tweets with tree-structured data. They must determine whether a 
tweet, comment, or reply contains hate speech, offensive language, or profanity 
(HOF) or is non-hate and offensive (NOT). The classification should consider 
both the individual content and support for hate expressed in the parent tweet.
  *   Task 2b involves the classification of conversational tweets with 
tree-structured data into specific forms of hate. Participants must identify if 
the tweet, comment, or reply contains standalone hate (SHOF), contextual hate 
(CHOF) that supports hate expressed in the parent, or if it is non-hate (NONE).

For more details, please visit Task 2 
webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlackgaQ3cC3KWA3A7oIjI0BCuz5GrEbnXK0YnlINTMPPdrr_X~PcFlF77uHN2da8HxHvcdJf06x3jV-bm5tis8JY8FYsAbnRn98PzzG~bp2fcV5f1ze3iC1rcrZTfSAceIyf9T75A3g3CkkT-bnWf3UsB6kH~mUdRa&s=WtKiQda0FckS0p0KDCBdRk_QxtE>

Task 3 aims to detect hateful spans within a sentence already considered 
hateful. A hate span is a set of continuous tokens that, in tandem, communicate 
the explicit hatefulness in a sentence.

  *   For instance, in the statement, "Women ... Can't live with them... Can't 
shoot them," the portion highlighted in bold will be considered a hateful span. 
This shared task aims to extract all such spans from a hateful text.
  *   The input texts are all in English. The detection of hateful spans is 
achieved by mapping this into a sequence labeling problem. For every token of 
the sequences, we have manually annotated the start and end of a hateful span. 
This is achieved by the BIO notation tagging, where B' represents the beginning 
of the hate span,' I' forms the continuation of a hate span, and' O' represents 
the non-hate tag. The task is then to learn the correct sequence of the BIO 
tags for a given sentence. For example, in the above sentence, the tag sequence 
for the preprocessed sentence will be of the form "women can't live with them 
can't shoot them" → "O O O O O B I I"; "I" notation cannot exist on its own and 
will always be preceded by either an "I" or "B". Consequently, a “B” notation 
can be immediately followed by an “O” in case the span is just a single word.

For more details, please visit Task 3 
webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacZyzjZQmnS5rUEIxoaw2FYcG25Z7J_gRHJJUcp4JKXOl4thC6COa9i~RG0N58ogF0DrXuL6YwRU2RjhX8HUMS6wBDbb6tMCc7cBhb9mlhYZJvCBxwmTxeJM01xT5VMX6LQQmNAmsnl2TrRez&s=Dw0BXsV3_dtoHi2T87rE7sFScrk>

Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese languages. It 
is a binary classification task. Each dataset (for the three languages) 
consists of a list of sentences with their corresponding class (hate or 
offensive (HOF) or not hate (NOT)). Data is primarily collected from Twitter, 
Facebook, and Youtube comments.

The Macro F1 score will be the yardstick of the task. Team rank will be 
determined based on the Macro F1 score of the first part.

For more details, please visit Task 4 
webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphti%3A%2Fstsg.teeoolsgem.c%2Fviwo%2F0oha3-22scln-athiani%2Fae-oeshhtem&s=hi9XoHnW5xc1PvQvk_kyIY5yH-Q>

Registration for all four tasks is open on our registration 
page.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacSEbXAm5ASTyPZp~mwSToakJHxJUigj0TV53jJLP8YRpjnznqUd4TQ~URRk2BF08gL8rxoeodN08p7dnwO2EZCQ6PuQTSx3WgHiC3559Ohe7pr6jBJBqmYxk6crbMjbqJnDqtqEUC560feaATSu1bybrXJD9466xoaj3QsZ&s=bGdnMV6qIjoYsO7tOx7A2JtwHog>

We believe that your expertise and contribution will be invaluable in advancing 
the state-of-the-art hate speech classification. We encourage you to 
participate in this exciting shared task and contribute to the research 
community.

Regards,
HASOC organizing team

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to