Hi Matej,

 

I wouldn’t want to swear that they are 100% correct, but the GUM corpus 
(https://corpling.uis.georgetown.edu/gum/) contains manually corrected lemmas 
from output produced by the TreeTagger. Unlike the POS tags, however, they were 
not produced and adjudicated by humans from scratch, so there may be errors – 
only a single human went over them, except for some cases that required 
discussion due to POS tag adjudication. Take a look here, the lemmas are 
easiest to see in the xml/ directory:

 

https://github.com/amir-zeldes/gum 

 

Best,

Amir

------------

Dr. Amir Zeldes

Asst. Prof. of Computational Linguistics

Department of Linguistics

Georgetown University

1437 37th St. NW

Washington, DC 20057

 

http://corpling.uis.georgetown.edu/amir

 

 

From: corpora-boun...@uib.no [mailto:corpora-boun...@uib.no] On Behalf Of matej 
martinc
Sent: Monday, September 19, 2016 5:33 AM
To: corpora@uib.no
Subject: [Corpora-List] English corpora with lemmas

 

Hello everybody,

 

I am looking for freely available English corpora that include lemmas of the 
words. Corpora would be used as a gold standard, so lemmas should be 
hand-annotated or at least human verified. 

 

So far I had only found British National Corpus: http://www.natcorp.ox.ac.uk/

 

Any suggestion about any other available corpora would be helpful. Thanks!

 

Kind regards,

Matej Martinc

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
http://mailman.uib.no/listinfo/corpora

Reply via email to