Hi Carsten,
I'm glad you're looking to use NuPIC for NLP. Here's a motivating example from 
our fall 2013 hackathon: [1].

A couple reasons I would recommend not doing a character-level model:
  1. http://www.brainhq.com/brain-resources/brain-teasers/scrambled-text
  2. Character-level sequences in TM would essentially memorize the sequences 
such that you wouldn't be able to generalize to new data. So constraining your 
model to e.g. a book chapter may work well, but it would not do well on any 
other chapter of the book. That is, there are far too many character sequences 
to learn in human language.

The theoretical points you raise on human language are accurate. State of the 
art deep learning models use methods such as sliding windows over text inputs, 
and bi-directional and/or stacked RNNs that process text both forwards and 
backwards.

I recommend playing around with the Cortical.io API [2], for which they offer a 
Python client [3] for querying things like word and text encodings.

[1] https://www.youtube.com/watch?v=X4XjYXFRIAQ&start=7084
[2] http://api.cortical.io/
[3] https://github.com/cortical-io/python-client-sdk

Cheers,
Alex

Reply via email to