Re: [Dev] [ML] Accuracy Measure for Anomaly Detection?

Ashen Weerathunga Wed, 23 Sep 2015 06:18:29 -0700

Hi all,

Thanks Mahesan for the suggestion. yes we can give all the measure if It is
better.


But there is some problem of drawing PR curve or ROC curve. Since we can
get only one point using the confusion matrix we cant give PR curve or ROC
curve in the summary of the model. Currently ROC curve provided only in
probabilistic classification methods. It's also calculated using the model
itself. But in this scenario we use K means algorithm. after generating the
clusters we evaluate the model using the test data according to the
percentile value that user provided. So as a result we can get the
confusion matrix which consist of TP,TN,FP,FN. But to draw a PR curve or
ROC curve that is not enough. Does anyone have any suggestions about that?
or should we drop it?

On Mon, Sep 21, 2015 at 7:05 AM, Sinnathamby Mahesan <[email protected]>
wrote:

> Ashen
> Here is a situation:
> Doctors  are testing a person for a disease, say, d.
> Doctor's point of view +ve means  patient has (d)
>
> Which is of the following is worse than the other?
> (1) The person who does NOT  have (d)  is identified as having (d)  -
>  (that is, false  positive )
> (2) The person who does have (d) is identified as NOT having (d)   -
>  (that is, false negative)
>
> Doctors  argument is that  we have to be more concern on reducing case
>  (2)
> That is to say,  the sensitivity needs to be high.
>
> Anyway, I also thought it is better to display all measures : sensitivity,
> specificity, precision and F1-Score
> (suggesting to consider sensitivity for the case of  anomalous being
> positive.
>
> Good Luck
> Mahesan
>
>
> On 18 September 2015 at 15:27, Ashen Weerathunga <[email protected]> wrote:
>
>> Hi all.
>>
>> Since we are considering the anomaly detection true positive would be a
>> case where a true anomaly detected as a anomaly by the model. Since in the
>> real world scenario of anomaly detection as you said the positive(anomaly)
>> instances are vary rare we can't go for more general measure. So I can
>> summarized the most applicable measures as below,
>>
>>    - Sensitivity(recall) - gives the True Positive Rate. ( TP/(TP + FN) )
>>    - Precision - gives the probability of predicting a True Positive
>>    from all positive predictions ( TP/(TP+FP) )
>>    - PR cure - Precision recall(Sensitivity) curve - PR curve plots
>>    Precision Vs. Recall.
>>    - F1 score - gives the harmonic mean of Precision and
>>    Sensitivity(recall) ( 2TP / (2TP + FP + FN) )
>>
>> So Precision and the Sensitivity are the most suitable measures to
>> measure a model where positive instances are very less. And PR curve and F1
>> score are mixtures of both Sensitivity and Precision. So PR curve and F1
>> score can be used to tell how good is the model IMO. We can give
>> Sensitivity and Precision also separately.
>>
>> Thanks everyone for the support.
>>
>> @Srinath, sure, I will write an article.
>>
>>
>> Thanks and Regards,
>>
>> Ashen
>>
>> On Thu, Sep 17, 2015 at 10:19 AM, madhuka udantha <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> This is good survey paper that can be found regard to Anomaly detection
>>> [1], According to your need; it seems you will no need to go through whole
>>> the survey papers. But few sub topics will be very useful for you. This
>>> paper will be useful for your work.
>>>
>>> [1] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly
>>> detection: A survey. ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58
>>> pages. DOI=10.1145/1541880.1541882
>>> <http://www.researchgate.net/profile/Vipin_Kumar26/publication/220565847_Anomaly_detection_A_survey/links/0deec5161f0ca7302a000000.pdf>
>>> [Cited by 2458]
>>>
>>> On Wed, Sep 16, 2015 at 3:35 PM, Ashen Weerathunga <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am currently doing the integration of anomaly detection feature for
>>>> ML. I have a problem of choosing the best accuracy measure for the model. I
>>>> can get the confusion matrix which consists of true positives, true
>>>> negatives, false positives and false negatives. There are few different
>>>> measures such as sensitivity, accuracy, F1 score, etc. So what will be the
>>>> best measure to give as the model accuracy for anomaly detection model.
>>>>
>>>> [1] <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>Some
>>>> details about those measures.
>>>>
>>>> Terminology and derivations
>>>> from a confusion matrix
>>>> <https://en.wikipedia.org/wiki/Confusion_matrix> true positive (TP)eqv.
>>>> with hittrue negative (TN)eqv. with correct rejectionfalse positive
>>>> (FP)eqv. with false alarm <https://en.wikipedia.org/wiki/False_alarm>, Type
>>>> I error <https://en.wikipedia.org/wiki/Type_I_error>false negative (FN)eqv.
>>>> with miss, Type II error <https://en.wikipedia.org/wiki/Type_II_error>
>>>> ------------------------------
>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29> or
>>>> true positive rate (TPR)eqv. with hit rate
>>>> <https://en.wikipedia.org/wiki/Hit_rate>, recall
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Recall>[image:
>>>> \mathit{TPR} = \mathit{TP} / P = \mathit{TP} / (\mathit{TP}+\mathit{FN})]
>>>> specificity <https://en.wikipedia.org/wiki/Specificity_%28tests%29>
>>>> (SPC) or true negative rate[image: \mathit{SPC} = \mathit{TN} / N =
>>>> \mathit{TN} / (\mathit{TN}+\mathit{FP})]precision
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> or positive
>>>> predictive value
>>>> <https://en.wikipedia.org/wiki/Positive_predictive_value> (PPV)[image:
>>>> \mathit{PPV} = \mathit{TP} / (\mathit{TP} + \mathit{FP})]negative
>>>> predictive value
>>>> <https://en.wikipedia.org/wiki/Negative_predictive_value> (NPV)[image:
>>>> \mathit{NPV} = \mathit{TN} / (\mathit{TN} + \mathit{FN})]fall-out
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Fall-out> or false
>>>> positive rate <https://en.wikipedia.org/wiki/False_positive_rate> 
>>>> (FPR)[image:
>>>> \mathit{FPR} = \mathit{FP} / N = \mathit{FP} / (\mathit{FP} + \mathit{TN})
>>>> = 1-\mathit{SPC}]false negative rate
>>>> <https://en.wikipedia.org/wiki/False_negative_rate> (FNR)[image:
>>>> \mathit{FNR} = \mathit{FN} / (\mathit{TP} + \mathit{FN}) = 
>>>> 1-\mathit{TPR}]false
>>>> discovery rate <https://en.wikipedia.org/wiki/False_discovery_rate>
>>>> (FDR)[image: \mathit{FDR} = \mathit{FP} / (\mathit{TP} + \mathit{FP})
>>>> = 1 - \mathit{PPV}]
>>>> ------------------------------
>>>> accuracy <https://en.wikipedia.org/wiki/Accuracy> (ACC)[image:
>>>> \mathit{ACC} = (\mathit{TP} + \mathit{TN}) / (\mathit{TP} + \mathit{FP} +
>>>> \mathit{FN} + \mathit{TN})]F1 score
>>>> <https://en.wikipedia.org/wiki/F1_score>is the harmonic mean
>>>> <https://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers>
>>>> of precision
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> and
>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29>[image:
>>>> \mathit{F1} = 2 \mathit{TP} / (2 \mathit{TP} + \mathit{FP} + 
>>>> \mathit{FN})]Matthews
>>>> correlation coefficient
>>>> <https://en.wikipedia.org/wiki/Matthews_correlation_coefficient> 
>>>> (MCC)[image:
>>>> \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP} \times \mathit{FN} }
>>>> {\sqrt{ (\mathit{TP}+\mathit{FP}) ( \mathit{TP} + \mathit{FN} ) (
>>>> \mathit{TN} + \mathit{FP} ) ( \mathit{TN} + \mathit{FN} ) } }]
>>>> Informedness[image: \mathit{TPR} + \mathit{SPC} - 1]Markedness
>>>> <https://en.wikipedia.org/wiki/Markedness>[image: \mathit{PPV} +
>>>> \mathit{NPV} - 1]
>>>>
>>>> *Sources: Fawcett (2006) and Powers (2011).*[1]
>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Fawcett2006-1>
>>>> [2]
>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Powers2011-2>
>>>>
>>>> Thanks and Regards,
>>>> Ashen
>>>> --
>>>> *Ashen Weerathunga*
>>>> Software Engineer - Intern
>>>> WSO2 Inc.: http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> Email: [email protected]
>>>> Mobile: +94 716042995 <94716042995>
>>>> LinkedIn:
>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>
>>>> _______________________________________________
>>>> Dev mailing list
>>>> [email protected]
>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Madhuka Udantha
>>> http://madhukaudantha.blogspot.com
>>>
>>
>>
>>
>> --
>> *Ashen Weerathunga*
>> Software Engineer - Intern
>> WSO2 Inc.: http://wso2.com
>> lean.enterprise.middleware
>>
>> Email: [email protected]
>> Mobile: +94 716042995 <94716042995>
>> LinkedIn:
>> *http://lk.linkedin.com/in/ashenweerathunga
>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Sinnathamby Mahesan
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>



-- 
*Ashen Weerathunga*
Software Engineer - Intern
WSO2 Inc.: http://wso2.com
lean.enterprise.middleware

Email: [email protected]
Mobile: +94 716042995 <94716042995>
LinkedIn:
*http://lk.linkedin.com/in/ashenweerathunga
<http://lk.linkedin.com/in/ashenweerathunga>*

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Accuracy Measure for Anomaly Detection?

Reply via email to