Dear Nupic list,
I've been spamming the list in the past days because I am trying to get
more familiar with theory and practice of HTM. I have come up with a use
case which I would find very interesting to try: character-based text
normalization/correction.

I envision the scenario as follows:
1. Learn "correct" text as temporal patterns (one character follows another)
2a. Error detection: estimate anomaly of each character according to the
preceding characters (in new data).
2b. Normalization/correction: predict next character according to
preceding characters.

A couple of questions arise. For instance, I am not sure whether the
temporal description is really applicable here. When I think of a human
reader who would solve the task, he would also look at the succeeding
characters, not only the preceding ones.

Another point is the model design. Again, thinking of human solutions
aligned to HTM theory (to my understanding, at least), a human probably
looks at a whole word and surrounding words to predict a single letter.
To my understanding, this suggests to have one layer that processes
character input and at least one on top that processes words; and
perhaps another one to process large sequences of words.
Or should any of this logic be reflected in another region?

Am I approximately right for the theoretical part?

For the implementation, I envision the following steps:
1. Swarm over the data (some correct text) to create model parameters.
BTW, could this step answer the question about the model design
regarding regions and layers?
2. Implement an anomaly detection model
...

Could any of the experienced users give an estimation about how this sounds?
Thanks!
Carsten



-- 
Carsten Schnober
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP) Lab
FB 20 / Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
[email protected]
www.ukp.tu-darmstadt.de

Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
(AIPHES): www.aiphes.tu-darmstadt.de
PhD program: Knowledge Discovery in Scientific Literature (KDSL)
www.kdsl.tu-darmstadt.de

Reply via email to