Hi Alex, Am 20.10.2015 um 20:38 schrieb Alex Lavin: > Hi Carsten, > I was pointing you to the Cortical.io API in case you opted for a word- > or text-level model. For a character-level model you would simply use > random, one-hot representations such that semantic similarities aren’t > encoded into the SDRs. You can use the CategoryEncoder [1] for this.
Thanks, that is a valuable information for getting started! > The feasibility of the TM learning character-level sequences depends on > how you define the learned sequences, and subsequently how many there > are. That is, is a single sequence defined as a word, a sentence, a > paragraph? Do the sequences repeat in the training? If not, the TM won’t > learn them. My approach would be try-and-error. There is an almost unlimited amount of training data available (e.g. Gutenberg books). The sequence length is another thing about which I have some doubts from a theoretical point of view. In a neural network approach, I would try various maximum lengths (e.g. 5, 10, 50, 100, ...) and expect the model to fit the weights so that too long sequences did not do any harm, apart from the computational cost. Can I do the same in HTM, not considering the computational cost? Set the sequence length to an arbitrarily large number and expect the anomaly detection to just take the relevant predecessors for a character into account? > The TM should have sufficient capacity given you have enough cells per > column [2] and segments per cell [3]. TM is a memory of sequences, but > fundamentally it learns transitions between inputs. Thus the capacity is > measured by how many transitions a TM region can store. For example, a > TM region of 2% column activation (i.e. sparsity), 32 cells per column, > and 128 segments per cell can store approximately (32/0.02)*128 = > 204,800 transitions. Thanks again, these are valuable details! Carsten -- Carsten Schnober Doctoral Researcher Ubiquitous Knowledge Processing (UKP) Lab FB 20 / Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111 [email protected] www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources (AIPHES): www.aiphes.tu-darmstadt.de PhD program: Knowledge Discovery in Scientific Literature (KDSL) www.kdsl.tu-darmstadt.de
