Hi Edward, I have been using two libraries that are based on Rapid Automatic Keyword Extraction (RAKE) algorithm and the Natural Language Toolkit (nltk) to derive keywords. So far, the result have been interesting but less than stellar.
https://pypi.org/project/multi-rake/ https://pypi.org/project/rake-nltk/ What I like about this approach is that it analyzes frequency and co-occurance to return key phrases (not just most frequent keywords) which may better represent the subject of the source text. Best, Ian Ian Matzen He/Him/His Systems and Digital Initiatives Librarian Westfield State University Westfield, MA 01086-1630 (413) 351 9178 imat...@westfield.ma.edu<mailto:imat...@westfield.ma.edu>|westfield.ma.edu<http://westfield.ma.edu/> [cid:image001.png@01D5E1A8.C39B5530] On Oct 22, 2020, at 2:25 PM, Edward M. Corrado <ecorr...@ecorrado.us<mailto:ecorr...@ecorrado.us>> wrote: Caution External Email: This email originated outside of WSU. Do not click links, open attachments, or respond if it appears to be suspicious. Hello, I have a set of just over 60,000 theses and dissertations abstracts that I want to automatically create keywords/topics from. Does anyone have any recommendations for text mining or other tools to start with? Regards, Edward