>> I am looking for freely available English corpora that include lemmas of the 
>> words. Corpora would be used as a gold standard, so lemmas should be 
>> hand-annotated or at least human verified.

>> So far I had only found British National Corpus: http://www.natcorp.ox.ac.uk/

All of the BYU corpora (http://corpus.byu.edu) are based directly on the 
original PoS tagging and lemmatization in the BNC, including the 520 million 
word COCA corpus, the 1.9 billion word GloWbE Corpus, and new NOW corpus (3.3 
billion words, and growing by 4-5 million words a day).

In addition to the free web-based interface, the corpus data is also available 
in downloadable full-text format: http://corpus.byu.edu/full-text/, including 
free samples (~2 million words each from COCA, COHA, and GloWbE).

The lemmatization was subsequently corrected for the word frequency data that 
is based on these corpora (http://www.wordfrequency.info/), which includes the 
top 60,000 lemmas in COCA, and the top 100,000 word forms (+PoS and lemmas) in 
COCA, COHA, BNC, and SOAP. In both cases, the word frequency / lemma lists were 
manually verified.

Best,

Mark Davies


============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

________________________________
From: corpora-boun...@uib.no <corpora-boun...@uib.no> on behalf of matej 
martinc <matejm...@gmail.com>
Sent: Monday, September 19, 2016 3:32 AM
To: corpora@uib.no
Subject: [Corpora-List] English corpora with lemmas

Hello everybody,

I am looking for freely available English corpora that include lemmas of the 
words. Corpora would be used as a gold standard, so lemmas should be 
hand-annotated or at least human verified.

So far I had only found British National Corpus: http://www.natcorp.ox.ac.uk/

Any suggestion about any other available corpora would be helpful. Thanks!

Kind regards,
Matej Martinc
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
http://mailman.uib.no/listinfo/corpora

Reply via email to