On Dec 13, 2017, at 8:58 AM, Rory Litwin <[email protected]> wrote:

> http://libraryjuiceacademy.com/112-digital-humanities.php


Interesting! Fun!! Good luck, sincerely.

Based on some of my experience, one of the impediments for doing “digital 
humanities” is the process of coercing one’s data/information into a format a 
computer can manipulate. To address this problem, I have hacked away & sketched 
out a few Web-based tools:

  * Extract plain text 
(http://dh.crc.nd.edu/sandbox/nlp-clients/tika-client.cgi) - Given a PDF (or 
just about any other file type), return plain text. The result of this process 
is the basis for just about everything else. Open in text editor. Find/replace 
space with newline. Normalize case. Sort. Open in spreadsheet to count & 
tabulate. Open in concordance. Feed to Voyant. Etc.

  * POS client (http://dh.crc.nd.edu/sandbox/nlp-clients/pos-client.cgi) - 
Given a plain text file, return ngrams, parts-of-speech, and lemmas in a number 
of formats. Again, the results of this tool can be fed to spreadsheets, 
databases, or visualization tools such as Wordle, OpenRefine, Tableau or a 
graphing tools like Gephi.

  * NER tools (http://dh.crc.nd.edu/sandbox/nlp-clients/ner-client.cgi) - 
Working much like the POS client, and given a plain text file, return lists of 
named entities from a text. 

It is not possible to create a generic tool that will support the actual 
analysis of text, because the analysis of text is particular to each scholar. I 
am only able to provide the data/information. I believe it is up to scholar to 
do the evaluation.

Feel free to give the tool(s) a go, but your milage will vary.

—
Eric Morgan

Reply via email to