Hi Ashen,

Please note the class imbalance which can typically occur in anomaly data
when selecting evaluation measures (anomalous data can be very infrequent
compared to normal data in a real-world dataset). Please check how this
imbalance affects evaluation measures. I found this paper [1] on this topic.

And since the data clusters play a vital role in this model it would be
better if we can show some measures on them as well IMO.

[1] http://marmota.dlsi.uji.es/WebBIB/papers/2007/1_GarciaTamida2007.pdf

Regards,
CD

On Thu, Sep 17, 2015 at 6:18 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk>
wrote:

> I'm sorry, that was a general answer.
>
> For anomaly detection, I'd say sensitivity or specificity (depending what
> is positive and what is negative: Mahesan's point) is more important than
> the others.
>
> For example, in a data set of 10,000 samples, where 100 of these samples
> are labeled positive (anomalous), a predictor that predicts "Negative" for
> every instance it is presented with evaluates to Precision = 100%, Accuracy
> = 99%, and Specificity = 100%. This predictor would be entirely useless,
> and yet these measures show it performs very well. The same predictor would
> evaluate to Recall (sensitivity) = 0%. In this case, Sensitivity seems to
> be most in tune with how well the classifier is actually performing.
>
> The other extreme is a data set where many of the examples are positive
> (normal). For example if 9,900 out of 10,000 instances are positive, and a
> classifier predicts positive on all instances, then Precision = 99%,
> Accuracy = 99%, Specificity = 0%, and Recall = 100%. In this case,
> Specificity shows that this classifier is problematic.
>
> Hope this helps.
>
>
>
> On Thu, Sep 17, 2015 at 6:05 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk>
> wrote:
>
>> Usually F1 measure and area under ROC curve.
>>
>> Ruvan.
>>
>>
>> On Thu, Sep 17, 2015 at 5:20 AM, Sinnathamby Mahesan <
>> sinnatha...@wso2.com> wrote:
>>
>>> Dear Ashen
>>>  Sensitivity  - in view of reducing the false negative
>>> Precision - in view of reducing the false positive
>>>
>>> F1 score combines both as the harmonic mean of precision and sensitivity
>>>
>>> That's why F1 is chosen normally and is simple  (2TP / (2TP + FN + FP))
>>>
>>>
>>>
>>> By the way, which you consider is True positive
>>> (a) Anomaly  - Anomaly
>>> or
>>> (b) Normal - Normal
>>>
>>> I think case (a) is more suited to your with regard to your objective.
>>>
>>> Or If you have trouble in choosing which way:
>>>
>>> You could consider Accuracy (Acc) which is somewhat similar to F1, but
>>> gives same weight to TP and TN
>>> Acc= ( ( TP + TN) / (TP + TN + FN + FP))
>>>
>>>
>>>
>>> = Good Luck
>>>
>>>
>>>
>>>
>>> On 16 September 2015 at 15:35, Ashen Weerathunga <as...@wso2.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am currently doing the integration of anomaly detection feature for
>>>> ML. I have a problem of choosing the best accuracy measure for the model. I
>>>> can get the confusion matrix which consists of true positives, true
>>>> negatives, false positives and false negatives. There are few different
>>>> measures such as sensitivity, accuracy, F1 score, etc. So what will be the
>>>> best measure to give as the model accuracy for anomaly detection model.
>>>>
>>>> [1] <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>Some
>>>> details about those measures.
>>>>
>>>> Terminology and derivations
>>>> from a confusion matrix
>>>> <https://en.wikipedia.org/wiki/Confusion_matrix> true positive (TP)eqv.
>>>> with hittrue negative (TN)eqv. with correct rejectionfalse positive
>>>> (FP)eqv. with false alarm <https://en.wikipedia.org/wiki/False_alarm>, Type
>>>> I error <https://en.wikipedia.org/wiki/Type_I_error>false negative (FN)eqv.
>>>> with miss, Type II error <https://en.wikipedia.org/wiki/Type_II_error>
>>>> ------------------------------
>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29> or
>>>> true positive rate (TPR)eqv. with hit rate
>>>> <https://en.wikipedia.org/wiki/Hit_rate>, recall
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Recall>[image:
>>>> \mathit{TPR} = \mathit{TP} / P = \mathit{TP} / (\mathit{TP}+\mathit{FN})]
>>>> specificity <https://en.wikipedia.org/wiki/Specificity_%28tests%29>
>>>> (SPC) or true negative rate[image: \mathit{SPC} = \mathit{TN} / N =
>>>> \mathit{TN} / (\mathit{TN}+\mathit{FP})]precision
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> or positive
>>>> predictive value
>>>> <https://en.wikipedia.org/wiki/Positive_predictive_value> (PPV)[image:
>>>> \mathit{PPV} = \mathit{TP} / (\mathit{TP} + \mathit{FP})]negative
>>>> predictive value
>>>> <https://en.wikipedia.org/wiki/Negative_predictive_value> (NPV)[image:
>>>> \mathit{NPV} = \mathit{TN} / (\mathit{TN} + \mathit{FN})]fall-out
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Fall-out> or false
>>>> positive rate <https://en.wikipedia.org/wiki/False_positive_rate> 
>>>> (FPR)[image:
>>>> \mathit{FPR} = \mathit{FP} / N = \mathit{FP} / (\mathit{FP} + \mathit{TN})
>>>> = 1-\mathit{SPC}]false negative rate
>>>> <https://en.wikipedia.org/wiki/False_negative_rate> (FNR)[image:
>>>> \mathit{FNR} = \mathit{FN} / (\mathit{TP} + \mathit{FN}) = 
>>>> 1-\mathit{TPR}]false
>>>> discovery rate <https://en.wikipedia.org/wiki/False_discovery_rate>
>>>> (FDR)[image: \mathit{FDR} = \mathit{FP} / (\mathit{TP} + \mathit{FP})
>>>> = 1 - \mathit{PPV}]
>>>> ------------------------------
>>>> accuracy <https://en.wikipedia.org/wiki/Accuracy> (ACC)[image:
>>>> \mathit{ACC} = (\mathit{TP} + \mathit{TN}) / (\mathit{TP} + \mathit{FP} +
>>>> \mathit{FN} + \mathit{TN})]F1 score
>>>> <https://en.wikipedia.org/wiki/F1_score>is the harmonic mean
>>>> <https://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers>
>>>> of precision
>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> and
>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29>[image:
>>>> \mathit{F1} = 2 \mathit{TP} / (2 \mathit{TP} + \mathit{FP} + 
>>>> \mathit{FN})]Matthews
>>>> correlation coefficient
>>>> <https://en.wikipedia.org/wiki/Matthews_correlation_coefficient> 
>>>> (MCC)[image:
>>>> \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP} \times \mathit{FN} }
>>>> {\sqrt{ (\mathit{TP}+\mathit{FP}) ( \mathit{TP} + \mathit{FN} ) (
>>>> \mathit{TN} + \mathit{FP} ) ( \mathit{TN} + \mathit{FN} ) } }]
>>>> Informedness[image: \mathit{TPR} + \mathit{SPC} - 1]Markedness
>>>> <https://en.wikipedia.org/wiki/Markedness>[image: \mathit{PPV} +
>>>> \mathit{NPV} - 1]
>>>>
>>>> *Sources: Fawcett (2006) and Powers (2011).*[1]
>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Fawcett2006-1>
>>>> [2]
>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Powers2011-2>
>>>>
>>>> Thanks and Regards,
>>>> Ashen
>>>> --
>>>> *Ashen Weerathunga*
>>>> Software Engineer - Intern
>>>> WSO2 Inc.: http://wso2.com
>>>> lean.enterprise.middleware
>>>>
>>>> Email: as...@wso2.com
>>>> Mobile: +94 716042995 <94716042995>
>>>> LinkedIn:
>>>> *http://lk.linkedin.com/in/ashenweerathunga
>>>> <http://lk.linkedin.com/in/ashenweerathunga>*
>>>>
>>>
>>>
>>>
>>> --
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Sinnathamby Mahesan
>>>
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>
>>
>


-- 
*CD Athuraliya*
Software Engineer
WSO2, Inc.
lean . enterprise . middleware
Mobile: +94 716288847 <94716288847>
LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
<https://twitter.com/cdathuraliya> | Blog <http://cdathuraliya.tumblr.com/>
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to