Hi Alex,

Am 20.10.2015 um 20:38 schrieb Alex Lavin:
> Hi Carsten,
> I was pointing you to the Cortical.io API in case you opted for a word-
> or text-level model. For a character-level model you would simply use
> random, one-hot representations such that semantic similarities aren’t
> encoded into the SDRs. You can use the CategoryEncoder [1] for this.

Thanks, that is a valuable information for getting started!

> The feasibility of the TM learning character-level sequences depends on
> how you define the learned sequences, and subsequently how many there
> are. That is, is a single sequence defined as a word, a sentence, a
> paragraph? Do the sequences repeat in the training? If not, the TM won’t
> learn them.

My approach would be try-and-error. There is an almost unlimited amount
of training data available (e.g. Gutenberg books).
The sequence length is another thing about which I have some doubts from
a theoretical point of view. In a neural network approach, I would try
various maximum lengths (e.g. 5, 10, 50, 100, ...) and expect the model
to fit the weights so that too long sequences did not do any harm, apart
from the computational cost.
Can I do the same in HTM, not considering the computational cost? Set
the sequence length to an arbitrarily large number and expect the
anomaly detection to just take the relevant predecessors for a character
into account?


> The TM should have sufficient capacity given you have enough cells per
> column [2] and segments per cell [3]. TM is a memory of sequences, but
> fundamentally it learns transitions between inputs. Thus the capacity is
> measured by how many transitions a TM region can store. For example, a
> TM region of 2% column activation (i.e. sparsity), 32 cells per column,
> and 128 segments per cell can store approximately (32/0.02)*128 =
> 204,800 transitions.

Thanks again, these are valuable details!
Carsten


-- 
Carsten Schnober
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP) Lab
FB 20 / Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
[email protected]
www.ukp.tu-darmstadt.de

Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
(AIPHES): www.aiphes.tu-darmstadt.de
PhD program: Knowledge Discovery in Scientific Literature (KDSL)
www.kdsl.tu-darmstadt.de

Reply via email to