Am 16.10.2015 um 21:00 schrieb Carin Meier: > I have played around with Stanford CoreNLP > http://nlp.stanford.edu/software/corenlp.shtml. It is supposed to be a > higher quality than NLTK https://github.com/gigasquid/stanford-talk
Hi everyone, I am a researcher in the field of NLP and recently got interested in HTM. This seems an excellent thread to join. :) Regarding POS tagging and other linguistic analysis and pre-processing: I am not sure about what NLTK uses, but the Stanford NLP Toolkit is considered state of the art. There are plenty of other components that might be interesting, but they live in a Java world. I don't want to spam, but this might be interesting here: in our lab, we use (and develop) the UIMA-based DKPro framework [1] for linguistic analysis and pre-processing. It allows to easily plug different components (such as tokenization, POS tagging, etc.) together to a custom pipeline, including Stanford NLP. A way out of the Python world here might be to store the data after pre-processing in a simple text file, with whitespace-separated tokens etc. The latest DKPro snapshot provides a TokenizedTextWriter [2] class for that purpose. Anyway, I am very curious about this specific task since it seems like a very good practical entry point into the HTM world for me. Best regards, Carsten [1] https://dkpro.github.io/dkpro-core/ [2] https://github.com/dkpro/dkpro-core/blob/master/de.tudarmstadt.ukp.dkpro.core.io.text-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/text/TokenizedTextWriter.java -- Carsten Schnober, M.Sc. Doctoral Researcher Ubiquitous Knowledge Processing (UKP Lab) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone (0)6151 16-6227, room S2/02/B111 www.ukp.tu-darmstadt.de
