> I am looking for freely available English corpora that include lemmas of
the words. Corpora would be used as a gold standard, so lemmas should be
hand-annotated or at least human verified.
The Groningen Meaning Bank is a freely available (and redistributable)
corpus of written English, and it has lemmas. It is automatically
annotated, but it is partly corrected by humans: http://gmb.let.rug.nl/
There should be a way of knowing which documents are corrected as far as
the lemma information goes, e.g. checking the single annotation (Bits of
Wisdom, in their lingo).
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list