***Apologies for cross-postings***

We would like to announce the release of the Spoken Corpus of Cameroon Pidgin English, a pilot corpus consisting of 240,000 words of spoken Cameroon Pidgin English (CPE), a widely-used yet stigmatised and largely uncodified pidgin/creole variety. This project was funded by a British Academy/Leverhulme grant (ref. SG140663)

The corpus consists of 80 .wav format sound recordings of private and public dialogues and monologues, each approximately 10-15 minutes in length. The recordings were conducted in five different locations in Cameroon (Bamenda, Buea, Douala, Kumba and Yaounde). Each sound file has two corresponding transcriptions (each around 3,000 words in length), one with mark-up only and the other with mark-up and POS-tagging.A tagset was devised and applied to the language, with TreeTagger reaching 94% accuracy.

Text categories and the proportions of monologue and dialogue are guided by those of the International Corpus of English (ICE) project, which makes the corpus immediately comparable with existing corpora of post-colonial varieties of English.

The corpus, which is freely accessible as a resource for linguistic description and comparison, is available at the Oxford Text Archive:


The accompanying documentation includes a list of participant data, a tagging guide and a word list/spelling guide.

Following successful completion of this pilot project, funding is currently being sought for the compilation of a larger (1M word) corpus of CPE.

Melanie Green (University of Sussex), Miriam Ayafor (University of Yaounde I), Gabriel Ozón, (University of Sheffield)

Dr Gabriel Ozon
Lecturer in Applied Linguistics
Director of Student Support and Personal Tutoring (School of English)

School of English
University of Sheffield
1 Upper Hanover Street
Sheffield, S3 7RA
United Kingdom

Tel:            +44 (0)114 222 8478
Web             www.sheffield.ac.uk/english/people/ozon
Voted number one for student experience
Times Higher Education Student Experience Survey 2014-2015

English Words and Sentences
Eva Duran Eppler, Gabriel Ozón
For more information see www.cambridge.org/9780521171878

UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list

Reply via email to