[Corpora-List] HASOC 2023 tasks - Call for Participation - Hate Speech and Offensive Content Identification

Dola Mullage, Damith P. via Corpora Tue, 25 Jul 2023 13:07:13 -0700

15th meeting of Forum for Information Retrieval Evaluation HASOC-2023

We are excited to announce the 5th edition of HASOC, consisting of four 
interesting shared tasks. We invite you to participate.

Task 1 focuses on identifying hate speech, offensive language, and profanity in
different languages using natural language processing techniques.

* Task 1A is identifying hate and offensive content in Sinhala, a
low-resource Indo-Aryan language spoken mainly in Sri Lanka. The task involves
classifying tweets into Hate and Offensive (HOF) or Non-Hate and Offensive
(NOT). The training set for this task is based on the Sinhala Offensive
Language Detection dataset, which contains 10,000 tweets.
* Task 1B focuses on identifying hate and offensive content in Gujarati,
another low-resource Indo-Aryan language spoken by approximately 50 million
people in India. Similarly, participants need to classify tweets into HOF or
NOT categories. The training set for this task consists of around 200 tweets.

For more details, please visit task 1
page<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacSZ3yixCb7bte6oxpFh8nz4kSfS5ewxlE7WYAkRehPzWIcUZf1XqGkI-ce158m10c0RlXh64IsEU0V2sv---Jl8N-wm83Xo23rDv7qMC8_ZgqYbxCb8gQzVK9Sa6WKObIiv3KOpvI8YWCyEMPXjsYoW0NBprurYUp&s=nORFqXBridY67j2VwRiCnAQ3DKQ>.

Task 2, Identification of Conversational Hate-Speech in Code-Mixed Languages
(ICHCL), addresses the challenge of identifying hate speech and offensive
content in code-mixed conversations on social media. Code-mixed text includes
multiple languages within a single conversation. The task is divided into two
subtasks.

* In Task 2a, participants need to perform binary classification on
conversational tweets with tree-structured data. They must determine whether a
tweet, comment, or reply contains hate speech, offensive language, or profanity
(HOF) or is non-hate and offensive (NOT). The classification should consider
both the individual content and support for hate expressed in the parent tweet.
* Task 2b involves the classification of conversational tweets with
tree-structured data into specific forms of hate. Participants must identify if
the tweet, comment, or reply contains standalone hate (SHOF), contextual hate
(CHOF) that supports hate expressed in the parent, or if it is non-hate (NONE).

For more details, please visit Task 2
webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlackgaQ3cC3KWA3A7oIjI0BCuz5GrEbnXK0YnlINTMPPdrr_X~PcFlF77uHN2da8HxHvcdJf06x3jV-bm5tis8JY8FYsAbnRn98PzzG~bp2fcV5f1ze3iC1rcrZTfSAceIyf9T75A3g3CkkT-bnWf3UsB6kH~mUdRa&s=WtKiQda0FckS0p0KDCBdRk_QxtE>

Task 3 aims to detect hateful spans within a sentence already considered
hateful. A hate span is a set of continuous tokens that, in tandem, communicate
the explicit hatefulness in a sentence.

* For instance, in the statement, "Women ... Can't live with them... Can't
shoot them," the portion highlighted in bold will be considered a hateful span.
This shared task aims to extract all such spans from a hateful text.
* The input texts are all in English. The detection of hateful spans is
achieved by mapping this into a sequence labeling problem. For every token of
the sequences, we have manually annotated the start and end of a hateful span.
This is achieved by the BIO notation tagging, where B' represents the beginning
of the hate span,' I' forms the continuation of a hate span, and' O' represents
the non-hate tag. The task is then to learn the correct sequence of the BIO
tags for a given sentence. For example, in the above sentence, the tag sequence
for the preprocessed sentence will be of the form "women can't live with them
can't shoot them" → "O O O O O B I I"; "I" notation cannot exist on its own and
will always be preceded by either an "I" or "B". Consequently, a “B” notation
can be immediately followed by an “O” in case the span is just a single word.

For more details, please visit Task 3
webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacZyzjZQmnS5rUEIxoaw2FYcG25Z7J_gRHJJUcp4JKXOl4thC6COa9i~RG0N58ogF0DrXuL6YwRU2RjhX8HUMS6wBDbb6tMCc7cBhb9mlhYZJvCBxwmTxeJM01xT5VMX6LQQmNAmsnl2TrRez&s=Dw0BXsV3_dtoHi2T87rE7sFScrk>

Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese languages. It
is a binary classification task. Each dataset (for the three languages)
consists of a list of sentences with their corresponding class (hate or
offensive (HOF) or not hate (NOT)). Data is primarily collected from Twitter,
Facebook, and Youtube comments.

The Macro F1 score will be the yardstick of the task. Team rank will be
determined based on the Macro F1 score of the first part.

For more details, please visit Task 4
webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphti%3A%2Fstsg.teeoolsgem.c%2Fviwo%2F0oha3-22scln-athiani%2Fae-oeshhtem&s=hi9XoHnW5xc1PvQvk_kyIY5yH-Q>

Registration for all four tasks is open on our registration
page.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacSEbXAm5ASTyPZp~mwSToakJHxJUigj0TV53jJLP8YRpjnznqUd4TQ~URRk2BF08gL8rxoeodN08p7dnwO2EZCQ6PuQTSx3WgHiC3559Ohe7pr6jBJBqmYxk6crbMjbqJnDqtqEUC560feaATSu1bybrXJD9466xoaj3QsZ&s=bGdnMV6qIjoYsO7tOx7A2JtwHog>

We believe that your expertise and contribution will be invaluable in advancing
the state-of-the-art hate speech classification. We encourage you to
participate in this exciting shared task and contribute to the research
community.

Regards,
HASOC organizing team

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] HASOC 2023 tasks - Call for Participation - Hate Speech and Offensive Content Identification

Reply via email to