Re: [Dev] [ML] Accuracy Measure for Anomaly Detection?

Srinath Perera Wed, 16 Sep 2015 21:32:29 -0700

Ashen, when you conclude this, can you write a blog/ article on comparing
different methods and why given thing is better.


--Srinath

On Thu, Sep 17, 2015 at 9:59 AM, Srinath Perera <srin...@wso2.com> wrote:

> Seshika and myself were talking to forester analyst and he mentioned "Lorenz
> curve" is used in fraud cases.
>
> Please read and find out what it is and how it compare to RoC etc.
>  see
> https://www.quora.com/What-is-the-difference-between-a-ROC-curve-and-a-precision-recall-curve-When-should-I-use-each
>
> On Thu, Sep 17, 2015 at 9:07 AM, CD Athuraliya <chathur...@wso2.com>
> wrote:
>
>> Hi Ashen,
>>
>> Please note the class imbalance which can typically occur in anomaly data
>> when selecting evaluation measures (anomalous data can be very infrequent
>> compared to normal data in a real-world dataset). Please check how this
>> imbalance affects evaluation measures. I found this paper [1] on this topic.
>>
>> And since the data clusters play a vital role in this model it would be
>> better if we can show some measures on them as well IMO.
>>
>> [1] http://marmota.dlsi.uji.es/WebBIB/papers/2007/1_GarciaTamida2007.pdf
>>
>> Regards,
>> CD
>>
>> On Thu, Sep 17, 2015 at 6:18 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk>
>> wrote:
>>
>>> I'm sorry, that was a general answer.
>>>
>>> For anomaly detection, I'd say sensitivity or specificity (depending
>>> what is positive and what is negative: Mahesan's point) is more important
>>> than the others.
>>>
>>> For example, in a data set of 10,000 samples, where 100 of these samples
>>> are labeled positive (anomalous), a predictor that predicts "Negative" for
>>> every instance it is presented with evaluates to Precision = 100%, Accuracy
>>> = 99%, and Specificity = 100%. This predictor would be entirely useless,
>>> and yet these measures show it performs very well. The same predictor would
>>> evaluate to Recall (sensitivity) = 0%. In this case, Sensitivity seems to
>>> be most in tune with how well the classifier is actually performing.
>>>
>>> The other extreme is a data set where many of the examples are positive
>>> (normal). For example if 9,900 out of 10,000 instances are positive, and a
>>> classifier predicts positive on all instances, then Precision = 99%,
>>> Accuracy = 99%, Specificity = 0%, and Recall = 100%. In this case,
>>> Specificity shows that this classifier is problematic.
>>>
>>> Hope this helps.
>>>
>>>
>>>
>>> On Thu, Sep 17, 2015 at 6:05 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk>
>>> wrote:
>>>
>>>> Usually F1 measure and area under ROC curve.
>>>>
>>>> Ruvan.
>>>>
>>>>
>>>> On Thu, Sep 17, 2015 at 5:20 AM, Sinnathamby Mahesan <
>>>> sinnatha...@wso2.com> wrote:
>>>>
>>>>> Dear Ashen
>>>>>  Sensitivity  - in view of reducing the false negative
>>>>> Precision - in view of reducing the false positive
>>>>>
>>>>> F1 score combines both as the harmonic mean of precision and
>>>>> sensitivity
>>>>>
>>>>> That's why F1 is chosen normally and is simple  (2TP / (2TP + FN + FP))
>>>>>
>>>>>
>>>>>
>>>>> By the way, which you consider is True positive
>>>>> (a) Anomaly  - Anomaly
>>>>> or
>>>>> (b) Normal - Normal
>>>>>
>>>>> I think case (a) is more suited to your with regard to your objective.
>>>>>
>>>>> Or If you have trouble in choosing which way:
>>>>>
>>>>> You could consider Accuracy (Acc) which is somewhat similar to F1, but
>>>>> gives same weight to TP and TN
>>>>> Acc= ( ( TP + TN) / (TP + TN + FN + FP))
>>>>>
>>>>>
>>>>>
>>>>> = Good Luck
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 16 September 2015 at 15:35, Ashen Weerathunga <as...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am currently doing the integration of anomaly detection feature for
>>>>>> ML. I have a problem of choosing the best accuracy measure for the 
>>>>>> model. I
>>>>>> can get the confusion matrix which consists of true positives, true
>>>>>> negatives, false positives and false negatives. There are few different
>>>>>> measures such as sensitivity, accuracy, F1 score, etc. So what will be 
>>>>>> the
>>>>>> best measure to give as the model accuracy for anomaly detection model.
>>>>>>
>>>>>> [1] <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>Some
>>>>>> details about those measures.
>>>>>>
>>>>>> Terminology and derivations
>>>>>> from a confusion matrix
>>>>>> <https://en.wikipedia.org/wiki/Confusion_matrix> true positive (TP)eqv.
>>>>>> with hittrue negative (TN)eqv. with correct rejectionfalse positive
>>>>>> (FP)eqv. with false alarm <https://en.wikipedia.org/wiki/False_alarm>,
>>>>>> Type I error <https://en.wikipedia.org/wiki/Type_I_error>false
>>>>>> negative (FN)eqv. with miss, Type II error
>>>>>> <https://en.wikipedia.org/wiki/Type_II_error>
>>>>>> ------------------------------
>>>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29>
>>>>>> or true positive rate (TPR)eqv. with hit rate
>>>>>> <https://en.wikipedia.org/wiki/Hit_rate>, recall
>>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Recall>[image:
>>>>>> \mathit{TPR} = \mathit{TP} / P = \mathit{TP} / (\mathit{TP}+\mathit{FN})]
>>>>>> specificity <https://en.wikipedia.org/wiki/Specificity_%28tests%29>
>>>>>> (SPC) or true negative rate[image: \mathit{SPC} = \mathit{TN} / N =
>>>>>> \mathit{TN} / (\mathit{TN}+\mathit{FP})]precision
>>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> or 
>>>>>> positive
>>>>>> predictive value
>>>>>> <https://en.wikipedia.org/wiki/Positive_predictive_value> (PPV)[image:
>>>>>> \mathit{PPV} = \mathit{TP} / (\mathit{TP} + \mathit{FP})]negative
>>>>>> predictive value
>>>>>> <https://en.wikipedia.org/wiki/Negative_predictive_value> (NPV)[image:
>>>>>> \mathit{NPV} = \mathit{TN} / (\mathit{TN} + \mathit{FN})]fall-out
>>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Fall-out> or false
>>>>>> positive rate <https://en.wikipedia.org/wiki/False_positive_rate>
>>>>>> (FPR)[image: \mathit{FPR} = \mathit{FP} / N = \mathit{FP} /
>>>>>> (\mathit{FP} + \mathit{TN}) = 1-\mathit{SPC}]false negative rate
>>>>>> <https://en.wikipedia.org/wiki/False_negative_rate> (FNR)[image:
>>>>>> \mathit{FNR} = \mathit{FN} / (\mathit{TP} + \mathit{FN}) = 
>>>>>> 1-\mathit{TPR}]false
>>>>>> discovery rate <https://en.wikipedia.org/wiki/False_discovery_rate>
>>>>>> (FDR)[image: \mathit{FDR} = \mathit{FP} / (\mathit{TP} +
>>>>>> \mathit{FP}) = 1 - \mathit{PPV}]
>>>>>> ------------------------------
>>>>>> accuracy <https://en.wikipedia.org/wiki/Accuracy> (ACC)[image:
>>>>>> \mathit{ACC} = (\mathit{TP} + \mathit{TN}) / (\mathit{TP} + \mathit{FP} +
>>>>>> \mathit{FN} + \mathit{TN})]F1 score
>>>>>> <https://en.wikipedia.org/wiki/F1_score>is the harmonic mean
>>>>>> <https://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers>
>>>>>> of precision
>>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> and
>>>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29>[image:
>>>>>> \mathit{F1} = 2 \mathit{TP} / (2 \mathit{TP} + \mathit{FP} + 
>>>>>> \mathit{FN})]Matthews
>>>>>> correlation coefficient
>>>>>> <https://en.wikipedia.org/wiki/Matthews_correlation_coefficient>
>>>>>> (MCC)[image: \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP}
>>>>>> \times \mathit{FN} } {\sqrt{ (\mathit{TP}+\mathit{FP}) ( \mathit{TP} +
>>>>>> \mathit{FN} ) ( \mathit{TN} + \mathit{FP} ) ( \mathit{TN} + \mathit{FN} 
>>>>>> ) }
>>>>>> }]Informedness[image: \mathit{TPR} + \mathit{SPC} - 1]Markedness
>>>>>> <https://en.wikipedia.org/wiki/Markedness>[image: \mathit{PPV} +
>>>>>> \mathit{NPV} - 1]
>>>>>>
>>>>>> *Sources: Fawcett (2006) and Powers (2011).*[1]
>>>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Fawcett2006-1>
>>>>>> [2]
>>>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Powers2011-2>
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Ashen
>>>>>> --
>>>>>> *Ashen Weerathunga*
>>>>>> Software Engineer - Intern
>>>>>> WSO2 Inc.: http://wso2.com
>>>>>> lean.enterprise.middleware
>>>>>>
>>>>>> Email: as...@wso2.com
>>>>>> Mobile: +94 716042995 <94716042995>
>>>>>> LinkedIn:
>>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> Sinnathamby Mahesan
>>>>>
>>>>>
>>>>>
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *CD Athuraliya*
>> Software Engineer
>> WSO2, Inc.
>> lean . enterprise . middleware
>> Mobile: +94 716288847 <94716288847>
>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>> <https://twitter.com/cdathuraliya> | Blog
>> <http://cdathuraliya.tumblr.com/>
>>
>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Accuracy Measure for Anomaly Detection?

Reply via email to