Also, just for reference, I posted these in another thread, but you might find this useful:
http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2015-October/012040.html --------- Matt Taylor OS Community Flag-Bearer Numenta On Wed, Oct 21, 2015 at 12:29 AM, Carsten Schnober <[email protected]> wrote: > Hi Alex, > > Am 20.10.2015 um 20:38 schrieb Alex Lavin: >> Hi Carsten, >> I was pointing you to the Cortical.io API in case you opted for a word- >> or text-level model. For a character-level model you would simply use >> random, one-hot representations such that semantic similarities aren’t >> encoded into the SDRs. You can use the CategoryEncoder [1] for this. > > Thanks, that is a valuable information for getting started! > >> The feasibility of the TM learning character-level sequences depends on >> how you define the learned sequences, and subsequently how many there >> are. That is, is a single sequence defined as a word, a sentence, a >> paragraph? Do the sequences repeat in the training? If not, the TM won’t >> learn them. > > My approach would be try-and-error. There is an almost unlimited amount > of training data available (e.g. Gutenberg books). > The sequence length is another thing about which I have some doubts from > a theoretical point of view. In a neural network approach, I would try > various maximum lengths (e.g. 5, 10, 50, 100, ...) and expect the model > to fit the weights so that too long sequences did not do any harm, apart > from the computational cost. > Can I do the same in HTM, not considering the computational cost? Set > the sequence length to an arbitrarily large number and expect the > anomaly detection to just take the relevant predecessors for a character > into account? > > >> The TM should have sufficient capacity given you have enough cells per >> column [2] and segments per cell [3]. TM is a memory of sequences, but >> fundamentally it learns transitions between inputs. Thus the capacity is >> measured by how many transitions a TM region can store. For example, a >> TM region of 2% column activation (i.e. sparsity), 32 cells per column, >> and 128 segments per cell can store approximately (32/0.02)*128 = >> 204,800 transitions. > > Thanks again, these are valuable details! > Carsten > > > -- > Carsten Schnober > Doctoral Researcher > Ubiquitous Knowledge Processing (UKP) Lab > FB 20 / Computer Science Department > Technische Universität Darmstadt > Hochschulstr. 10, D-64289 Darmstadt, Germany > phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111 > [email protected] > www.ukp.tu-darmstadt.de > > Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de > GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources > (AIPHES): www.aiphes.tu-darmstadt.de > PhD program: Knowledge Discovery in Scientific Literature (KDSL) > www.kdsl.tu-darmstadt.de >
