Dear Nupic list, I've been spamming the list in the past days because I am trying to get more familiar with theory and practice of HTM. I have come up with a use case which I would find very interesting to try: character-based text normalization/correction.
I envision the scenario as follows: 1. Learn "correct" text as temporal patterns (one character follows another) 2a. Error detection: estimate anomaly of each character according to the preceding characters (in new data). 2b. Normalization/correction: predict next character according to preceding characters. A couple of questions arise. For instance, I am not sure whether the temporal description is really applicable here. When I think of a human reader who would solve the task, he would also look at the succeeding characters, not only the preceding ones. Another point is the model design. Again, thinking of human solutions aligned to HTM theory (to my understanding, at least), a human probably looks at a whole word and surrounding words to predict a single letter. To my understanding, this suggests to have one layer that processes character input and at least one on top that processes words; and perhaps another one to process large sequences of words. Or should any of this logic be reflected in another region? Am I approximately right for the theoretical part? For the implementation, I envision the following steps: 1. Swarm over the data (some correct text) to create model parameters. BTW, could this step answer the question about the model design regarding regions and layers? 2. Implement an anomaly detection model ... Could any of the experienced users give an estimation about how this sounds? Thanks! Carsten -- Carsten Schnober Doctoral Researcher Ubiquitous Knowledge Processing (UKP) Lab FB 20 / Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111 [email protected] www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources (AIPHES): www.aiphes.tu-darmstadt.de PhD program: Knowledge Discovery in Scientific Literature (KDSL) www.kdsl.tu-darmstadt.de
