Hi Edward,

I have been using two libraries that are based on Rapid Automatic Keyword 
Extraction (RAKE) algorithm and the Natural Language Toolkit (nltk) to derive 
keywords. So far, the result have been interesting but less than stellar.

https://pypi.org/project/multi-rake/
https://pypi.org/project/rake-nltk/

What I like about this approach is that it analyzes frequency and co-occurance 
to return key phrases (not just most frequent keywords) which may better 
represent the subject of the source text.

Best,

Ian

Ian Matzen
 He/Him/His
 Systems and Digital Initiatives Librarian
 Westfield State University
 Westfield, MA 01086-1630
 (413) 351 9178
imat...@westfield.ma.edu<mailto:imat...@westfield.ma.edu>|westfield.ma.edu<http://westfield.ma.edu/>

[cid:image001.png@01D5E1A8.C39B5530]

On Oct 22, 2020, at 2:25 PM, Edward M. Corrado 
<ecorr...@ecorrado.us<mailto:ecorr...@ecorrado.us>> wrote:

Caution External Email: This email originated outside of WSU. Do not click 
links, open attachments, or respond if it appears to be suspicious.

Hello,

I have a set of just over 60,000 theses and dissertations abstracts that I
want to automatically create keywords/topics from. Does anyone have any
recommendations for text mining or other tools to start with?

Regards,
Edward

Reply via email to