Am 16.10.2015 um 21:00 schrieb Carin Meier:
> I have played around with Stanford CoreNLP
> http://nlp.stanford.edu/software/corenlp.shtml.  It is supposed to be a
> higher quality than NLTK https://github.com/gigasquid/stanford-talk

Hi everyone,
I am a researcher in the field of NLP and recently got interested in
HTM. This seems an excellent thread to join. :)

Regarding POS tagging and other linguistic analysis and pre-processing:
I am not sure about what NLTK uses, but the Stanford NLP Toolkit is
considered state of the art. There are plenty of other components that
might be interesting, but they live in a Java world.

I don't want to spam, but this might be interesting here: in our lab, we
use (and develop) the UIMA-based DKPro framework [1] for linguistic
analysis and pre-processing. It allows to easily plug different
components (such as tokenization, POS tagging, etc.) together to a
custom pipeline, including Stanford NLP.
A way out of the Python world here might be to store the data after
pre-processing in a simple text file, with whitespace-separated tokens
etc. The latest DKPro snapshot provides a TokenizedTextWriter [2] class
for that purpose.

Anyway, I am very curious about this specific task since it seems like a
very good practical entry point into the HTM world for me.

Best regards,
Carsten


[1] https://dkpro.github.io/dkpro-core/
[2]
https://github.com/dkpro/dkpro-core/blob/master/de.tudarmstadt.ukp.dkpro.core.io.text-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/text/TokenizedTextWriter.java

-- 
Carsten Schnober, M.Sc.
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP Lab)
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone (0)6151 16-6227, room S2/02/B111
www.ukp.tu-darmstadt.de

Reply via email to