Also, just for reference, I posted these in another thread, but you
might find this useful:

http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2015-October/012040.html
---------
Matt Taylor
OS Community Flag-Bearer
Numenta


On Wed, Oct 21, 2015 at 12:29 AM, Carsten Schnober
<[email protected]> wrote:
> Hi Alex,
>
> Am 20.10.2015 um 20:38 schrieb Alex Lavin:
>> Hi Carsten,
>> I was pointing you to the Cortical.io API in case you opted for a word-
>> or text-level model. For a character-level model you would simply use
>> random, one-hot representations such that semantic similarities aren’t
>> encoded into the SDRs. You can use the CategoryEncoder [1] for this.
>
> Thanks, that is a valuable information for getting started!
>
>> The feasibility of the TM learning character-level sequences depends on
>> how you define the learned sequences, and subsequently how many there
>> are. That is, is a single sequence defined as a word, a sentence, a
>> paragraph? Do the sequences repeat in the training? If not, the TM won’t
>> learn them.
>
> My approach would be try-and-error. There is an almost unlimited amount
> of training data available (e.g. Gutenberg books).
> The sequence length is another thing about which I have some doubts from
> a theoretical point of view. In a neural network approach, I would try
> various maximum lengths (e.g. 5, 10, 50, 100, ...) and expect the model
> to fit the weights so that too long sequences did not do any harm, apart
> from the computational cost.
> Can I do the same in HTM, not considering the computational cost? Set
> the sequence length to an arbitrarily large number and expect the
> anomaly detection to just take the relevant predecessors for a character
> into account?
>
>
>> The TM should have sufficient capacity given you have enough cells per
>> column [2] and segments per cell [3]. TM is a memory of sequences, but
>> fundamentally it learns transitions between inputs. Thus the capacity is
>> measured by how many transitions a TM region can store. For example, a
>> TM region of 2% column activation (i.e. sparsity), 32 cells per column,
>> and 128 segments per cell can store approximately (32/0.02)*128 =
>> 204,800 transitions.
>
> Thanks again, these are valuable details!
> Carsten
>
>
> --
> Carsten Schnober
> Doctoral Researcher
> Ubiquitous Knowledge Processing (UKP) Lab
> FB 20 / Computer Science Department
> Technische Universität Darmstadt
> Hochschulstr. 10, D-64289 Darmstadt, Germany
> phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
> [email protected]
> www.ukp.tu-darmstadt.de
>
> Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
> GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
> (AIPHES): www.aiphes.tu-darmstadt.de
> PhD program: Knowledge Discovery in Scientific Literature (KDSL)
> www.kdsl.tu-darmstadt.de
>

Reply via email to