Re: [CODE4LIB] hands-on workshop on natural language processing & text mining

Chris Gray Thu, 16 Nov 2017 08:21:21 -0800

Eric,

You might be interested in something I ran across recently. AdityaParameswaran (http://data-people.cs.illinois.edu/) gave a talk at ourcampus recently about the efforts of a group he participates in that isaimed at "simplifying and improving data analytics, i.e., helping usersmake better use of their data". He wrote a recent blog post forO'Reilly on "Enabling Data Science for the Majority"(https://www.oreilly.com/ideas/enabling-data-science-for-the-majority),which was the topic of the talk I heard.

He introduced 3 of the 6 projects his team has been working on:DataSpread, Zenvisage, and OrpheusDB all aimed at what they call "HILDA"-- "human-in-the-loop data analytics". The 3 projects listed have homesin github and are linked to from Aditya's page: "Quick Project Links". At the talk, he said they have hosted versions running and they arelooking for beta testers. There is a live demo of DataSpread athttp://kite.cs.illinois.edu:8080/.


Chris

On 2017-11-09 01:13 PM, Eric Lease Morgan wrote:

I’m thinking about a hands-on workshop on natural language processing & text 
mining, below, and your feedback is desired.  —ELM


Natural language processing & text mining using freely available tools: "No 
programming necessary"

This text outlines a hands-on natural language & text mining workshop.

It is possible to do simple & rudimentary natural language processing & text mining 
with a set of freely available tools. No programming is necessary. This workshop 
facilitates hands-on exercises demonstrating how this can be done. By participating in this 
workshop, students & researchers will be able to:

  * identify patterns, anomalies, and trends in their texts
  * practice both "distant" and "scalable" reading
  * enhance & complement their ability to do "close" reading
  * use & understand a corpus of poetry or prose at scale

Activities in the workshop include:

  * learning what natural language processing is, and why you should care
  * articulating a research question
  * creating a corpus
  * creating a plain text version of a corpus with Tika [1]
  * using Voyant Tools to do some "distant" reading" [2]
  * using a concordance (AntConc) to facilitate searching keywords in context 
[3]
  * creating a simple word list with a text editor
  * cleaning & analyzing word lists with OpenRefine [4]
  * charting & graphing word lists with Tableau Public [5]
  * increasing meaning by extracting parts-of-speech with the Standford POS 
Tagger [6]
  * increasing meaning some by extracting named entities with the Standford NER 
[7]
  * identifying themes and clustering documents using MALLET [8]

Anybody with sets of texts can benefit from this workshop. Any corpus of 
textual content is apropos: journal articles, books, the complete run of a 
magazine, blog postings, Tweets, press releases, conference proceedings, 
websites, poetry, etc. This workshop is computer (Windows, Linux, Macintosh) 
agnostic. All the software used in this workshop is freely available on the 
'Net, or it is already installed on one's computer. Active participation 
requires zero programming, but students must bring their own computer, and they 
must not be afraid of their computer's command line interface.

This workshop will not make participants an expert in natural language 
processing, but it will empower them to make better sense of large sets of 
textual information.

[1] Tika - http://tika.apache.org
[2] Voyant - http://voyant-tools.org
[3] AntConc - http://www.laurenceanthony.net/software/antconc/
[4] OpenRefine - http://openrefine.org
[5] Tableau Public - https://public.tableau.com/
[6] POS Tagger - https://nlp.stanford.edu/software/tagger.shtml
[7] NER - https://nlp.stanford.edu/software/CRF-NER.shtml
[8] MALLET - http://mallet.cs.umass.edu

Re: [CODE4LIB] hands-on workshop on natural language processing & text mining

Reply via email to