Hi mraptor,
I recommend taking a look at a new paper of ours:
Scott Purdy, "Encoding Data for HTM Systems": http://arxiv.org/abs/1602.05925.

In your temperature encoding example, knowing the range of possible values a 
priori is indeed useful. You would simply use two independent scaler encoders, 
one for each of the two scenarios -- habitable temperatures and chemical 
reaction temperatures. The random distributed scaler encoder [1] comes in handy 
when the temperature ranges are not known; it dynamically adjusts the range as 
the min and/or max change with new data.

Regarding word embeddings, you're correct that "distributed representation is a 
result of the context of usage"; the underlying assumption in state-of-the-art 
word embedding methods is that words appearing in similar contexts have similar 
meanings. By sliding a window through some corpus of text (with various tricks) 
dense-distributed representations are learned, specifically toward use in a 
deep learning network. Similarly, Cortical.io [2] creates sparse distributed 
representations (SDRs), which are potentially more useful in a range of NLP 
tasks, and can be used as input to HTM models. To encode text into SDRs check 
out the python[3] or java[4] clients for their API.

Hopefully this info will help clear up any confusion you may have!

[1] 
https://github.com/numenta/nupic/blob/master/src/nupic/encoders/random_distributed_scalar.py
[2] http://www.cortical.io/technology.html
[3] https://github.com/cortical-io/retina-sdk.py
[4] https://github.com/cortical-io/retina-api-java-sdk

Cheers,
Alex

Alexander Lavin
Software Engineer
Numenta

Reply via email to