Hi mraptor, I recommend taking a look at a new paper of ours: Scott Purdy, "Encoding Data for HTM Systems": http://arxiv.org/abs/1602.05925.
In your temperature encoding example, knowing the range of possible values a priori is indeed useful. You would simply use two independent scaler encoders, one for each of the two scenarios -- habitable temperatures and chemical reaction temperatures. The random distributed scaler encoder [1] comes in handy when the temperature ranges are not known; it dynamically adjusts the range as the min and/or max change with new data. Regarding word embeddings, you're correct that "distributed representation is a result of the context of usage"; the underlying assumption in state-of-the-art word embedding methods is that words appearing in similar contexts have similar meanings. By sliding a window through some corpus of text (with various tricks) dense-distributed representations are learned, specifically toward use in a deep learning network. Similarly, Cortical.io [2] creates sparse distributed representations (SDRs), which are potentially more useful in a range of NLP tasks, and can be used as input to HTM models. To encode text into SDRs check out the python[3] or java[4] clients for their API. Hopefully this info will help clear up any confusion you may have! [1] https://github.com/numenta/nupic/blob/master/src/nupic/encoders/random_distributed_scalar.py [2] http://www.cortical.io/technology.html [3] https://github.com/cortical-io/retina-sdk.py [4] https://github.com/cortical-io/retina-api-java-sdk Cheers, Alex Alexander Lavin Software Engineer Numenta
