Hi Matej,


I wouldn’t want to swear that they are 100% correct, but the GUM corpus 
(https://corpling.uis.georgetown.edu/gum/) contains manually corrected lemmas 
from output produced by the TreeTagger. Unlike the POS tags, however, they were 
not produced and adjudicated by humans from scratch, so there may be errors – 
only a single human went over them, except for some cases that required 
discussion due to POS tag adjudication. Take a look here, the lemmas are 
easiest to see in the xml/ directory:







Dr. Amir Zeldes

Asst. Prof. of Computational Linguistics

Department of Linguistics

Georgetown University

1437 37th St. NW

Washington, DC 20057





From: corpora-boun...@uib.no [mailto:corpora-boun...@uib.no] On Behalf Of matej 
Sent: Monday, September 19, 2016 5:33 AM
To: corpora@uib.no
Subject: [Corpora-List] English corpora with lemmas


Hello everybody,


I am looking for freely available English corpora that include lemmas of the 
words. Corpora would be used as a gold standard, so lemmas should be 
hand-annotated or at least human verified. 


So far I had only found British National Corpus: http://www.natcorp.ox.ac.uk/


Any suggestion about any other available corpora would be helpful. Thanks!


Kind regards,

Matej Martinc

UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list

Reply via email to