Re: [Dev] [ML] Accuracy Measure for Anomaly Detection?

Srinath Perera Wed, 16 Sep 2015 21:30:12 -0700

Seshika and myself were talking to forester analyst and he mentioned "Lorenz
curve" is used in fraud cases.


Please read and find out what it is and how it compare to RoC etc.
 see
https://www.quora.com/What-is-the-difference-between-a-ROC-curve-and-a-precision-recall-curve-When-should-I-use-each

On Thu, Sep 17, 2015 at 9:07 AM, CD Athuraliya <chathur...@wso2.com> wrote:

> Hi Ashen,
>
> Please note the class imbalance which can typically occur in anomaly data
> when selecting evaluation measures (anomalous data can be very infrequent
> compared to normal data in a real-world dataset). Please check how this
> imbalance affects evaluation measures. I found this paper [1] on this topic.
>
> And since the data clusters play a vital role in this model it would be
> better if we can show some measures on them as well IMO.
>
> [1] http://marmota.dlsi.uji.es/WebBIB/papers/2007/1_GarciaTamida2007.pdf
>
> Regards,
> CD
>
> On Thu, Sep 17, 2015 at 6:18 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk>
> wrote:
>
>> I'm sorry, that was a general answer.
>>
>> For anomaly detection, I'd say sensitivity or specificity (depending what
>> is positive and what is negative: Mahesan's point) is more important than
>> the others.
>>
>> For example, in a data set of 10,000 samples, where 100 of these samples
>> are labeled positive (anomalous), a predictor that predicts "Negative" for
>> every instance it is presented with evaluates to Precision = 100%, Accuracy
>> = 99%, and Specificity = 100%. This predictor would be entirely useless,
>> and yet these measures show it performs very well. The same predictor would
>> evaluate to Recall (sensitivity) = 0%. In this case, Sensitivity seems to
>> be most in tune with how well the classifier is actually performing.
>>
>> The other extreme is a data set where many of the examples are positive
>> (normal). For example if 9,900 out of 10,000 instances are positive, and a
>> classifier predicts positive on all instances, then Precision = 99%,
>> Accuracy = 99%, Specificity = 0%, and Recall = 100%. In this case,
>> Specificity shows that this classifier is problematic.
>>
>> Hope this helps.
>>
>>
>>
>> On Thu, Sep 17, 2015 at 6:05 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk>
>> wrote:
>>
>>> Usually F1 measure and area under ROC curve.
>>>
>>> Ruvan.
>>>
>>>
>>> On Thu, Sep 17, 2015 at 5:20 AM, Sinnathamby Mahesan <
>>> sinnatha...@wso2.com> wrote:
>>>
>>>> Dear Ashen
>>>>  Sensitivity  - in view of reducing the false negative
>>>> Precision - in view of reducing the false positive
>>>>
>>>> F1 score combines both as the harmonic mean of precision and sensitivity
>>>>
>>>> That's why F1 is chosen normally and is simple  (2TP / (2TP + FN + FP))
>>>>
>>>>
>>>>
>>>> By the way, which you consider is True positive
>>>> (a) Anomaly  - Anomaly
>>>> or
>>>> (b) Normal - Normal
>>>>
>>>> I think case (a) is more suited to your with regard to your objective.
>>>>
>>>> Or If you have trouble in choosing which way:
>>>>
>>>> You could consider Accuracy (Acc) which is somewhat similar to F1, but
>>>> gives same weight to TP and TN
>>>> Acc= ( ( TP + TN) / (TP + TN + FN + FP))
>>>>
>>>>
>>>>
>>>> = Good Luck
>>>>
>>>>
>>>>
>>>>
>>>> On 16 September 2015 at 15:35, Ashen Weerathunga <as...@wso2.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am currently doing the integration of anomaly detection feature for
>>>>> ML. I have a problem of choosing the best accuracy measure for the model. 
>>>>> I
>>>>> can get the confusion matrix which consists of true positives, true
>>>>> negatives, false positives and false negatives. There are few different
>>>>> measures such as sensitivity, accuracy, F1 score, etc. So what will be the
>>>>> best measure to give as the model accuracy for anomaly detection model.
>>>>>
>>>>> [1] <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>Some
>>>>> details about those measures.
>>>>>
>>>>> Terminology and derivations
>>>>> from a confusion matrix
>>>>> <https://en.wikipedia.org/wiki/Confusion_matrix> true positive (TP)eqv.
>>>>> with hittrue negative (TN)eqv. with correct rejectionfalse positive
>>>>> (FP)eqv. with false alarm <https://en.wikipedia.org/wiki/False_alarm>,
>>>>> Type I error <https://en.wikipedia.org/wiki/Type_I_error>false
>>>>> negative (FN)eqv. with miss, Type II error
>>>>> <https://en.wikipedia.org/wiki/Type_II_error>
>>>>> ------------------------------
>>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29> or
>>>>> true positive rate (TPR)eqv. with hit rate
>>>>> <https://en.wikipedia.org/wiki/Hit_rate>, recall
>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Recall>[image:
>>>>> \mathit{TPR} = \mathit{TP} / P = \mathit{TP} / (\mathit{TP}+\mathit{FN})]
>>>>> specificity <https://en.wikipedia.org/wiki/Specificity_%28tests%29>
>>>>> (SPC) or true negative rate[image: \mathit{SPC} = \mathit{TN} / N =
>>>>> \mathit{TN} / (\mathit{TN}+\mathit{FP})]precision
>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> or 
>>>>> positive
>>>>> predictive value
>>>>> <https://en.wikipedia.org/wiki/Positive_predictive_value> (PPV)[image:
>>>>> \mathit{PPV} = \mathit{TP} / (\mathit{TP} + \mathit{FP})]negative
>>>>> predictive value
>>>>> <https://en.wikipedia.org/wiki/Negative_predictive_value> (NPV)[image:
>>>>> \mathit{NPV} = \mathit{TN} / (\mathit{TN} + \mathit{FN})]fall-out
>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Fall-out> or false
>>>>> positive rate <https://en.wikipedia.org/wiki/False_positive_rate>
>>>>> (FPR)[image: \mathit{FPR} = \mathit{FP} / N = \mathit{FP} /
>>>>> (\mathit{FP} + \mathit{TN}) = 1-\mathit{SPC}]false negative rate
>>>>> <https://en.wikipedia.org/wiki/False_negative_rate> (FNR)[image:
>>>>> \mathit{FNR} = \mathit{FN} / (\mathit{TP} + \mathit{FN}) = 
>>>>> 1-\mathit{TPR}]false
>>>>> discovery rate <https://en.wikipedia.org/wiki/False_discovery_rate>
>>>>> (FDR)[image: \mathit{FDR} = \mathit{FP} / (\mathit{TP} + \mathit{FP})
>>>>> = 1 - \mathit{PPV}]
>>>>> ------------------------------
>>>>> accuracy <https://en.wikipedia.org/wiki/Accuracy> (ACC)[image:
>>>>> \mathit{ACC} = (\mathit{TP} + \mathit{TN}) / (\mathit{TP} + \mathit{FP} +
>>>>> \mathit{FN} + \mathit{TN})]F1 score
>>>>> <https://en.wikipedia.org/wiki/F1_score>is the harmonic mean
>>>>> <https://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers>
>>>>> of precision
>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> and
>>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29>[image:
>>>>> \mathit{F1} = 2 \mathit{TP} / (2 \mathit{TP} + \mathit{FP} + 
>>>>> \mathit{FN})]Matthews
>>>>> correlation coefficient
>>>>> <https://en.wikipedia.org/wiki/Matthews_correlation_coefficient> 
>>>>> (MCC)[image:
>>>>> \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP} \times \mathit{FN} }
>>>>> {\sqrt{ (\mathit{TP}+\mathit{FP}) ( \mathit{TP} + \mathit{FN} ) (
>>>>> \mathit{TN} + \mathit{FP} ) ( \mathit{TN} + \mathit{FN} ) } }]
>>>>> Informedness[image: \mathit{TPR} + \mathit{SPC} - 1]Markedness
>>>>> <https://en.wikipedia.org/wiki/Markedness>[image: \mathit{PPV} +
>>>>> \mathit{NPV} - 1]
>>>>>
>>>>> *Sources: Fawcett (2006) and Powers (2011).*[1]
>>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Fawcett2006-1>
>>>>> [2]
>>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Powers2011-2>
>>>>>
>>>>> Thanks and Regards,
>>>>> Ashen
>>>>> --
>>>>> *Ashen Weerathunga*
>>>>> Software Engineer - Intern
>>>>> WSO2 Inc.: http://wso2.com
>>>>> lean.enterprise.middleware
>>>>>
>>>>> Email: as...@wso2.com
>>>>> Mobile: +94 716042995 <94716042995>
>>>>> LinkedIn:
>>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> Sinnathamby Mahesan
>>>>
>>>>
>>>>
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>
>>>
>>
>
>
> --
> *CD Athuraliya*
> Software Engineer
> WSO2, Inc.
> lean . enterprise . middleware
> Mobile: +94 716288847 <94716288847>
> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
> <https://twitter.com/cdathuraliya> | Blog
> <http://cdathuraliya.tumblr.com/>
>



-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Accuracy Measure for Anomaly Detection?

Reply via email to