Hi Martin,

I'm not sure if this will help, but if your student is interested in doing 
something with richly annotated Gutenberg data, there is a very deeply 
annotated corpus including about 0.5M tokens of samples from Project Gutenberg 
novels here, next to data from 7 other genres:

https://github.com/gucorpling/amalgum/blob/dev/amalgum/fiction/dep/AMALGUM_fiction_adams.conllu

The data is automatically annotated with good quality neural UD parses, 
coreference resolution, entity recognition, discourse parses and more, with 
excerpts from over 400 novels included. We also have a much smaller but 
manually annotated corpus which includes fiction, along with other genres in 
our GUM/GENTLE corpora (24 genres total):

https://gucorpling.org/gum/ 

Hope these are useful,
Amir
------------
Dr. Amir Zeldes
Assoc. Prof. of Computational Linguistics
Department of Linguistics
Georgetown University
1437 37th St. NW
Washington, DC 20057

https://gucorpling.org/amir 

-----Original Message-----
From: Martin Wynne via Corpora <[email protected]> 
Sent: Sunday, October 27, 2024 8:11 AM
To: [email protected]
Subject: [Corpora-List] Corpora of English novels

I have a student who is interested in tracing the development of the English 
novel from its origins to the present day (or at least to the start of the 
twentieth century), and I'm trying to gather information about relevant corpora 
covering this text type and period.

We know about the European Literary Text Collection (ELTeC,
https://www.google.com/url?q=https://www.distant-reading.net/eltec/&source=gmail-imap&ust=1730635931000000&usg=AOvVaw2Y1rJdwNxnHfCqswyPsa22)
 which will be very useful for the later end of the timescale. We also know it 
is possible to assemble a corpus from Project Gutenberg, archive.org, Oxford 
Text Archive, etc. 
, but would be interested in re-using any corpora that people might already 
have made, which aim to be representative of particular periods within this 
genre.

The student has some flexibility with her research question, so while the 
original idea of 'English novels' was probably 'novels in English from Great 
Britain and Ireland', other related areas such as US novels might be 
interesting as well.

Any tips and suggestions gratefully received. If we get a number of interesting 
direct emails, I'll be happy to summarize the results to the list.

Best wishes,
Martin

--
Senior Researcher in Corpus Linguistics
Faculty of Linguistics, Philology and Phonetics, University of Oxford National 
Co-ordinator, CLARIN-UK [email protected] 
https://www.google.com/url?q=https://orcid.org/0000-0002-4155-0530&source=gmail-imap&ust=1730635931000000&usg=AOvVaw1i_exZAWOHquyE8Wlol7Le

_______________________________________________
Corpora mailing list -- [email protected] 
https://www.google.com/url?q=https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/&source=gmail-imap&ust=1730635931000000&usg=AOvVaw3ExL6BwTVsV7vY84JjtMck
To unsubscribe send an email to [email protected]

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to