Dear colleagues,

We are proud to announce the release of a new Brown type of American
English corpus, i.e. CROWN2021, and six comparable corpora of Catalan,
Danish, German, Farsi/Persian, Finnish, Italian, and dozens of similar
corpora to come in the next few months.

CROWN2021 is a balanced Brown family American English corpus of one million
words containing texts published in 2021. It was developed under the
leadership of Prof. Jiajin Xu and the texts were collected by Mingchen Sun
and 12 other graduate students at Beijing Foreign Studies University
(BFSU). CROWN2021 serves as an updated language resource of present-day
American written English, and a reference corpus for contrastive studies
involving diachronic variation (with Brown, Frown, Crown), regional
variation (with LOB, FLOB, CLOB) and cross-linguistic comparison (with
LCMC, ToRCH family corpora, GLOBE family corpora).

Users can have access to the online version of CROWN2021 and other
BFSU-made Brown family corpora at BFSU CQPweb Corpus Portal (
http://114.251.154.212/cqp/). Both user ID and passcode are "*test*".

KEY INFORMATION

Project leader: Jiajin Xu of the National Research Centre for Foreign
Language Education (NRCFLE), BFSU
Text collectors: Mingchen Sun (359 texts), Yagang Chen (47 texts), Shujuan
Deng (21 texts), Tingyan Zhangchen (19 texts), Meijia Hao (15 texts),
Xingke Lv (13 texts), Jiaxi Shen (5 texts), Yuanyuan Lin (4 texts), Junyu
Mao (4 texts), Xinzhi Yang (4 texts), Zinuo Zuo (4 texts), Xinkai Deng (3
texts), Ruotong Zha (2 texts)
Time of compilation: April 2022 - October 2022
Size: Approximately one million words
Language: Contemporary American English
Number of texts/samples: 500 samples of 2000+ words each (Short texts are
pieced together to form one 2000-word text, but saved separately and marked
with A, B, C etc. in the filenames.)
Sampling strategy: The Brown Corpus model (see:
http://korpus.uib.no/icame/manuals/BROWN/INDEX.HTM)
Period: The texts were published in 2021.
Released in: November 2022
POS TagSet: The BNC Basic (C5) Tagset
POS Tagger: TreeTagger
Lemmatiser: TreeTagger
Sentence Segmenter: spaCy

How to cite:
Mingchen Sun, Jiajin Xu et al. 2022. The CROWN2021 Corpus. National
Research Centre for Foreign Language Education, Beijing Foreign Studies
University

Related work:
Xu, Jiajin & Maocheng Liang. 2013. A tale of two C's: Comparing English
varieties with Crown and CLOB (The 2009 Brown family corpora)
<http://icame.uib.no/ij37/Pages_175-184.pdf>. *ICAME Journal *37: 175-183.

Jiajin Xu
Professor
Beijing Foreign Studies University
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to