Re: How do you evaluate a correctness and accurancy of a prediction

Marek Otahal Wed, 13 Jan 2016 04:44:09 -0800

Hi Wakan,

nice discussion, thank you.


I agree with Raf on importance of correct metrics, I'd like to suggest
using anomaly detection, a HTM implicit feature, as a way to tell where &
how much the prediction were off.

For some results, you can refer to NAB https://github.com/numenta/NAB -
real world, streaming datasets; and
https://github.com/breznak/neural.benchmark for synthetic.

Cheers,
Mark

On Wed, Jan 13, 2016 at 12:34 PM, Raf <[email protected]> wrote:

> As you can see, that little bugger of my neighbour's kid, makes two
>> mistakes: in the first case the error according to the RMSE/MSE/MAE
>> wouldn't be that a big error (440.00-348.23 = 91.77) and the latter case,
>> instead, would represent for the mlearning system (in this case the kid) a
>> MASSIVE error (698.46-440.00 = 258.76).
>>
> I realized that I made a mistake in the second calculation which should be
> 698.46-348.23 = 350.23 (which is even worst!).
>
>
> On 13/01/2016 11:26, Raf wrote:
>
>> Hi Wakan,
>>
>> in the case of a classification task (when you have a dataset with
>> already defined "labels") the RMSE/MSE/MAE would be calculated (at least
>> with other ml models) between the probability predicted for the right class
>> (0.00, 1.00) vs the real class (0.00, 1.00). That is why, for this peculiar
>> job I wouldn't advice this way of estimate the error. Even using the price
>> as a loss function, though, for a classification task... I suppose that
>> wouldn't perform well in the real world (imho).
>>
>> Nonetheless, even in the case of a pure regression task (where you would
>> try to predict the next price, for example) and even calculating the
>> RMSE/MSE/MAE as a cost function between prices... it wouldn't be useful in
>> real life (imho). Why? Because it wouldn't take into account non-linear
>> real life scenarios like possible fees and commissions, price slippages and
>> lack of liquidity.
>>
>> What you really want is to know if right now is a good moment to "do
>> nothing", "buy" or "sell" - not just the price change.
>>
>> I'll try to make an example (sorry if I raise too many trading concepts,
>> but it actually fits the topic in my opinion).
>>
>> - It is 12pm. The current price of something is 1205.00 (something/USD).
>> The prediction of the price for the next three hours (15pm) is  1155.00: it
>> is lower, but it is not worth the trade... for now.
>> -One hour passes (it is 13pm), current real price is 1170.00. Let's
>> imagine that the real price in three hours from now will be a lot lower
>> than the actual price: 900.00. The predicted price for the next three hours
>> is 905.00 : wonderful prediction! It is a lot lower than the current price,
>> so we sell.
>>
>> But.... because it is happening during some news (everyday at 15.00 CET
>> or 16.00 CET there is going to be some news that shakes that particular
>> asset and the machine learning system understood that), your bank decides
>> to increase the spread (or the commissions) in order to filter orders and
>> giving precedence to bigger investments at the expense of smaller ones.
>> Also, because there is huge volatility, your order cannot be processed at
>> the price you hoped to sell because your request to "sell" doesn't match
>> any other "buy" action for that price: this means your order is processed
>> at a price of 960.00 instead of the original price (1170.00). Furthermore,
>> due to the network high traffic at that time, you can't promptly process
>> the order, slipping of a couple of mseconds that though turn into another
>> price slippage. This means that between the widened spread/commissions and
>> this market "slippage", your profit is almost non-existent and you could
>> probably even end up losing money.
>>
>> Now, I remind us the original question I've been asking to my mlearning
>> system: "is it a good moment for buying, selling, or doing nothing?".
>>
>> Of course a human mind would prefer not to enter into volatility (at
>> 13pm) - instead, some sort of "blind" mlearning system that takes into
>> account only the price change would be tempted to enter exactly in that
>> moment. This is not what we wanted to start with.
>>
>>
>> I did this example just because I'm familiar with these kinds of scenario
>> but probably this wasn't the best choice for NuPIC.
>>
>> I'll make another example which, in my opinion, fits better a neocortical
>> algorithm. Let's imagine now that my neighbour has a kid that is learning
>> piano (this is actually happening :-) ).
>> He is learning this[1] song: "Frère Jacques". When I'm working and I
>> listen to him through my wall, my brain (see HTM paper which explains that
>> brilliantly) expects this exact sequence[2]: C - D - E - C - C - D - E - C
>> | E - F  (...) . Now, when the kid learns, of course he makes mistakes
>> (that's how we learn after all!).
>> When he plays some note wrong (that is not in the above sequence) he,
>> actually "we" :), understand that he was wrong because my TemporalMultiStep
>> Prediction detected an "anomaly" (trying to use NuPIC jargon here).
>> If his brain had to consider a "brutal" :) RMSE or MAE he should take the
>> frequencies of the notes in the piano[3] and subtracting the expected
>> (real) value from the played value.
>>
>> EXPECTED VALUES (considering the middle C):
>> 261.63 Hz - 293.66 Hz - 329.63 Hz - 261.63 Hz - 261.63 Hz - 293.66 Hz -
>> 329.63 Hz - 261.63 Hz - | - 329.63 Hz - 348.23 Hz - ....
>>
>> PLAYED VALUES (Error 1):
>> 261.63 Hz - 293.66 Hz - 329.63 Hz - 261.63 Hz - 261.63 Hz - 293.66 Hz -
>> 329.63 Hz - 261.63 Hz - | - 329.63 Hz - 444.00 Hz  (A) ....
>>
>> PLAYED VALUES (Error 2):
>> 261.63 Hz - 293.66 Hz - 329.63 Hz - 261.63 Hz - 261.63 Hz - 293.66 Hz -
>> 329.63 Hz - 261.63 Hz - | - 329.63 Hz - 698.46 Hz  (NEXT OCTAVE F insted of
>> current octave F) ....
>>
>>
>> As you can see, that little bugger of my neighbour's kid, makes two
>> mistakes: in the first case the error according to the RMSE/MSE/MAE
>> wouldn't be that a big error (440.00-348.23 = 91.77) and the latter case,
>> instead, would represent for the mlearning system (in this case the kid) a
>> MASSIVE error (698.46-440.00 = 258.76).
>>
>> In the reality, though, it is much "less-horrible" to listen to the
>> second error (which is at least the same note, although of the next octave)
>> than the first error (which is a totally different note).
>>
>> If the kid had to learn using RMSE/MSE/MAE I think he wouldn't been able
>> to distinguish between "little" and "massive" errors thus he couldn't learn
>> to play the piano.
>>
>>
>> At the end, what I wanted to stress, to give you my opinion, is that the
>> choice of the loss function/metrics is very very very important for
>> defining the correct learning method of any machine learning system.
>>
>> My two cents :)
>>
>> Raf
>>
>>
>> [1]: https://www.youtube.com/watch?v=eYtuOYABwes
>> [2]: http://www.true-piano-lessons.com/images/FrerejacquesinCtab.jpg
>> [3]: http://amath.colorado.edu/pub/matlab/music/frequencies.jpg
>>
>>
>>
>>
>>
>> On 13/01/2016 10:24, Wakan Tanka wrote:
>>
>>> Thank you Raf,
>>>
>>> why do you think that you would not notice difference between little and
>>> big mistake? I suppose that little mistakes will have lower square and the
>>> big will have bigger square. When you will then average all values which
>>> you have obtained such way then it is possible that one big mistake will
>>> drastically change the final score, but in general this is what you want
>>> isn't it? No matter if you were predicting lot of small mistakes or one big
>>> mistake, if the amount of money you have lost is the same.
>>>
>>> PS: I suppose using some other metrics (some kind of clustering or maybe
>>> more simple method using just basic histograms) it should be possible to
>>> filter just those big mistakes
>>>
>>> am I wrong?
>>>
>>>
>>> On 01/13/2016 08:50 AM, Raf wrote:
>>>
>>>> Hello Wakan.
>>>>
>>>> This is a huge point you are making and defining a loss function can
>>>> completely change the validity of a ML algo.
>>>>
>>>> Depending on your task (regression, classification) I strongly suggest
>>>> you to create your own Metrics[1]: this, imho, could have a big impact
>>>> on how the HTM region processes the data - it literally changes the
>>>> "learning goal".
>>>>
>>>> I'll try to clarify what I mean with a general classification example
>>>> not strictly linked to NuPIC.
>>>> Let's imagine I've a simple task of time series classification that's
>>>> maybe a bit unrealistic but it'll do the job.
>>>> I'm receiving oil prices and I'd like to know if now is the right moment
>>>> to  perform no action (label 0), to "buy" (label 1) or to "sell" (label
>>>> 2). The prediction obtained by the algo would consist of the probability
>>>> for each label; as an example: label 0 = 0.12 (12%), label 1 = 0.70
>>>> (70%), label 2 = 0.18 (18%.
>>>> Now, if I just evaluated the error using the distance of the predicted
>>>> value from the real value in terms of RMSE I would not notice (and most
>>>> of all I wouldn't let my ML system notice) the subtle differences
>>>> between a little mistake (the action is wrong and the price difference
>>>> is not that big) and a big mistake (the action is wrong again but the
>>>> price difference is huge this time). In this case, for example, using as
>>>> a loss function the outcome of the trade in terms of money if we
>>>> performed the trade for real (including fees and commissions) it could,
>>>> imho, give you a better overall learning process that is more useful in
>>>> the real world.
>>>> Of course, this has nothing to do with NuPIC per se but I suppose it is
>>>> common in basically all the ML algos you can think of.
>>>>
>>>> Raf
>>>> [1]
>>>>
>>>> https://github.com/numenta/nupic/blob/master/src/nupic/frameworks/opf/metrics.py
>>>>
>>>>
>>>>
>>>> On 13/01/2016 02:19, Wakan Tanka wrote:
>>>>
>>>>> Hello NuPIC,
>>>>>
>>>>> How do you evaluate a correctness and accurancy of a prediction? Or if
>>>>> you have multiple predictions for same data how do you compare which
>>>>> prediction was more accurate? I've seen that there is NAB [1] but to
>>>>> be honest I did not get deep into so I do not know if it might help or
>>>>> not. AFAIK when you want to do such things the correlation should work
>>>>> fine, in this case correlation between original and predicted data.
>>>>> But correlation works only when you have linear data, it would not
>>>>> work e.g. on hotgym example where you have repeating cycles, peaks,
>>>>> maybe random events in particular days etc. So my intuitive approach
>>>>> was to calculate absolute difference [2] of original and predicted
>>>>> value and then calculate mean of those values. The lower the mean is
>>>>> the better the prediction is. Then I've realized that there is
>>>>> standard deviation [3] which can be calculated from those absolute
>>>>> differences. Next step would be pick up all values which have absolute
>>>>> differences of original and predicted value:
>>>>> 1. above  mean + standard deviation
>>>>> 2. bellow mean - standard deviation
>>>>>
>>>>> This should give me an overview of how many values falls in this
>>>>> interval and how many is doesn't. The dataset where more values falls
>>>>> in the interval is dataset with better prediction.
>>>>>
>>>>> Does this make sense?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>> http://numenta.com/blog/nab-a-benchmark-for-streaming-anomaly-detection.html
>>>>>
>>>>> [2] https://en.wikipedia.org/wiki/Absolute_difference
>>>>> [3] http://www.mathsisfun.com/data/standard-deviation.html
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
> --
> Raf
>
> www.madraf.com/algotrading
> reply to: [email protected]
> skype: algotrading_madraf
>
>
>


-- 
Marek Otahal :o)

Re: How do you evaluate a correctness and accurancy of a prediction

Reply via email to