Re: How to correctly interpret NuPIC output vol.2 to @Scott Purdy

Chetan Surpur Tue, 10 Nov 2015 10:02:39 -0800

Hi Wakan,

The documented code for the Temporal Memory algorithm is in nupic [1].


[1] 
https://github.com/numenta/nupic/blob/master/src/nupic/research/temporal_memory.py
 
<https://github.com/numenta/nupic/blob/master/src/nupic/research/temporal_memory.py>

- Chetan

> On Nov 10, 2015, at 8:54 AM, Wakan Tanka <[email protected]> wrote:
> 
> Hello Scott,
> Where can I find further info about Temporal Memory Algorithm? I've read 
> White paper but English is not my native language so I'm a bit lost.
> 
> Thank you very much.
> 
> 
> On 11/03/2015 11:53 PM, Scott Purdy wrote:
>> Thanks Daniel, that is a good explanation. The longer version can be
>> seen here:
>> http://numenta.com/learn/science-of-anomaly-detection.html
>> 
>> And you probably want to understand the Temporal Memory algorithm as
>> well since the anomaly score is built on top of that.
>> 
>> On Tue, Nov 3, 2015 at 12:56 PM, Daniel McDonald
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>>    There's one aspect of this thread that I don't feel has been touched
>>    on, which may help in understanding prediction and the anomaly
>>    score.  I learned this at the spring hackathon in NY.
>> 
>>    If you look at how the anomaly score is implemented [1], you'll see
>>    that it computes the ratio of the difference of the number of active
>>    columns and the number of active columns which were also predicted
>>    to the number of actives.  That is, (#active - #activeAndPredicted)
>>    / #active.  Note that this formula does not depend on the total
>>    number of predicted columns.  In fact, if the HTM predicts all
>>    columns, the anomaly score will be 0 for any subsequent input.  In
>>    this case, the HTM would be completely uncertain about the next step
>>    in the sequence, so it predicts a superposition of all possible
>>    patterns; therefore, any subsequent input is not anomalous.
>> 
>>    That is exactly what happened to my Market Patterns hack at the
>>    hackathon.  After training the HTM on years of stock market data,
>>    the anomaly score dropped quite low; however, when I looked
>>    carefully at what was going on, the HTM had, in fact, saturated and
>>    was predicting more than half of the columns to be active at each
>>    step in the sequence.  In effect, it was saying that the sequences
>>    were unpredictable and anything was possible in the next step (we
>>    already knew that about the stock market, right?).  Consequently,
>>    whatever happened next was not anomalous.
>> 
>>    When I look at your example data, I read it this way:
>> 
>>    At 175, 0.0 was read and 0.0 is the prediction for the next step.
>>    The anomaly score of 0.325 is meaningless, because we don't have
>>    data from the previous step.
>> 
>>    At 176, 62 was read, which doesn't match the prediction of 0.0 (from
>>    175), so it is anomalous (0.65).  52 is predicted for the next step.
>> 
>>    At 177, 402 is read.  It is completely anomalous (1.0).  That is
>>    there is no overlap in the columns predicted for the value 52 and
>>    the columns active for the value 402.  If you are using a scalar
>>    encoder, that makes sense, since the bit patterns for such different
>>    numbers likely have no overlap in the encoding or in the SDR
>>    produced by the SP.  0.0 is predicted for the next step.
>> 
>>    At 178, 0 is read, and the anomaly score drops low (0.125), since
>>    the actual matches closely to what was predicted at the previous
>>    step.  The score isn't exactly 0, because the predicted SDR from the
>>    previous step and the encoded SDR for the new input may differ in
>>    some columns.  In other words, in the previous step, when 0.0 was
>>    reported as the prediction, this was only an approximate translation
>>    of a predicted SDR, where 0.0 was the closest decoded
>>    representation.  0.0 is predicted for the next step.
>> 
>>    At 179, 402 is read, which is completely anomalous (1.0) because the
>>    predicted SDR for 0.0 had no column in common with the encoded SDR
>>    for 402.
>> 
>>    180 is similar to 178, and 0.0 is predicted.
>> 
>>    At 181, 3 is read.  The anomaly score is low (0.05), because the
>>    scalar encoder produces overlapping patterns for similar numbers, so
>>    there is likely overlap in the SDR's for 0 and 3.  402 is predicted.
>> 
>>    At 182, 50 is read.  The anomaly score is low (0.1), which is a bit
>>    puzzling; however, it may be due to saturation.  The prediction of
>>    402 could represent a case where many columns were predicted
>>    representing a superposition of possible states, and 402 was just
>>    the strongest one (i.e., had the highest overlap of the encoded SDR
>>    for 402 with the predicted columns).  That is, 52 may have also been
>>    predicted, but to a lesser degree than 402.  It may be helpful to
>>    look at how many columns are predicted vs. active in each step to
>>    see when this happens.  If the number of predicted columns suddenly
>>    jumps, it means that the HTM is uncertain about the next step (or,
>>    that it sees many possible next steps given the current context).
>> 
>>    [1]
>>    
>> https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/anomaly.py
>> 
>>    Best regards,
>>    Daniel
>> 
>>    On Tue, Nov 3, 2015 at 4:23 AM, Wakan Tanka <[email protected]
>>    <mailto:[email protected]>> wrote:
>> 
> 
>

Re: How to correctly interpret NuPIC output vol.2 to @Scott Purdy

Reply via email to