Ashen, when you conclude this, can you write a blog/ article on comparing different methods and why given thing is better.
--Srinath On Thu, Sep 17, 2015 at 9:59 AM, Srinath Perera <srin...@wso2.com> wrote: > Seshika and myself were talking to forester analyst and he mentioned "Lorenz > curve" is used in fraud cases. > > Please read and find out what it is and how it compare to RoC etc. > see > https://www.quora.com/What-is-the-difference-between-a-ROC-curve-and-a-precision-recall-curve-When-should-I-use-each > > On Thu, Sep 17, 2015 at 9:07 AM, CD Athuraliya <chathur...@wso2.com> > wrote: > >> Hi Ashen, >> >> Please note the class imbalance which can typically occur in anomaly data >> when selecting evaluation measures (anomalous data can be very infrequent >> compared to normal data in a real-world dataset). Please check how this >> imbalance affects evaluation measures. I found this paper [1] on this topic. >> >> And since the data clusters play a vital role in this model it would be >> better if we can show some measures on them as well IMO. >> >> [1] http://marmota.dlsi.uji.es/WebBIB/papers/2007/1_GarciaTamida2007.pdf >> >> Regards, >> CD >> >> On Thu, Sep 17, 2015 at 6:18 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk> >> wrote: >> >>> I'm sorry, that was a general answer. >>> >>> For anomaly detection, I'd say sensitivity or specificity (depending >>> what is positive and what is negative: Mahesan's point) is more important >>> than the others. >>> >>> For example, in a data set of 10,000 samples, where 100 of these samples >>> are labeled positive (anomalous), a predictor that predicts "Negative" for >>> every instance it is presented with evaluates to Precision = 100%, Accuracy >>> = 99%, and Specificity = 100%. This predictor would be entirely useless, >>> and yet these measures show it performs very well. The same predictor would >>> evaluate to Recall (sensitivity) = 0%. In this case, Sensitivity seems to >>> be most in tune with how well the classifier is actually performing. >>> >>> The other extreme is a data set where many of the examples are positive >>> (normal). For example if 9,900 out of 10,000 instances are positive, and a >>> classifier predicts positive on all instances, then Precision = 99%, >>> Accuracy = 99%, Specificity = 0%, and Recall = 100%. In this case, >>> Specificity shows that this classifier is problematic. >>> >>> Hope this helps. >>> >>> >>> >>> On Thu, Sep 17, 2015 at 6:05 AM, A. R.Weerasinghe <a...@ucsc.cmb.ac.lk> >>> wrote: >>> >>>> Usually F1 measure and area under ROC curve. >>>> >>>> Ruvan. >>>> >>>> >>>> On Thu, Sep 17, 2015 at 5:20 AM, Sinnathamby Mahesan < >>>> sinnatha...@wso2.com> wrote: >>>> >>>>> Dear Ashen >>>>> Sensitivity - in view of reducing the false negative >>>>> Precision - in view of reducing the false positive >>>>> >>>>> F1 score combines both as the harmonic mean of precision and >>>>> sensitivity >>>>> >>>>> That's why F1 is chosen normally and is simple (2TP / (2TP + FN + FP)) >>>>> >>>>> >>>>> >>>>> By the way, which you consider is True positive >>>>> (a) Anomaly - Anomaly >>>>> or >>>>> (b) Normal - Normal >>>>> >>>>> I think case (a) is more suited to your with regard to your objective. >>>>> >>>>> Or If you have trouble in choosing which way: >>>>> >>>>> You could consider Accuracy (Acc) which is somewhat similar to F1, but >>>>> gives same weight to TP and TN >>>>> Acc= ( ( TP + TN) / (TP + TN + FN + FP)) >>>>> >>>>> >>>>> >>>>> = Good Luck >>>>> >>>>> >>>>> >>>>> >>>>> On 16 September 2015 at 15:35, Ashen Weerathunga <as...@wso2.com> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am currently doing the integration of anomaly detection feature for >>>>>> ML. I have a problem of choosing the best accuracy measure for the >>>>>> model. I >>>>>> can get the confusion matrix which consists of true positives, true >>>>>> negatives, false positives and false negatives. There are few different >>>>>> measures such as sensitivity, accuracy, F1 score, etc. So what will be >>>>>> the >>>>>> best measure to give as the model accuracy for anomaly detection model. >>>>>> >>>>>> [1] <https://en.wikipedia.org/wiki/Sensitivity_and_specificity>Some >>>>>> details about those measures. >>>>>> >>>>>> Terminology and derivations >>>>>> from a confusion matrix >>>>>> <https://en.wikipedia.org/wiki/Confusion_matrix> true positive (TP)eqv. >>>>>> with hittrue negative (TN)eqv. with correct rejectionfalse positive >>>>>> (FP)eqv. with false alarm <https://en.wikipedia.org/wiki/False_alarm>, >>>>>> Type I error <https://en.wikipedia.org/wiki/Type_I_error>false >>>>>> negative (FN)eqv. with miss, Type II error >>>>>> <https://en.wikipedia.org/wiki/Type_II_error> >>>>>> ------------------------------ >>>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29> >>>>>> or true positive rate (TPR)eqv. with hit rate >>>>>> <https://en.wikipedia.org/wiki/Hit_rate>, recall >>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Recall>[image: >>>>>> \mathit{TPR} = \mathit{TP} / P = \mathit{TP} / (\mathit{TP}+\mathit{FN})] >>>>>> specificity <https://en.wikipedia.org/wiki/Specificity_%28tests%29> >>>>>> (SPC) or true negative rate[image: \mathit{SPC} = \mathit{TN} / N = >>>>>> \mathit{TN} / (\mathit{TN}+\mathit{FP})]precision >>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> or >>>>>> positive >>>>>> predictive value >>>>>> <https://en.wikipedia.org/wiki/Positive_predictive_value> (PPV)[image: >>>>>> \mathit{PPV} = \mathit{TP} / (\mathit{TP} + \mathit{FP})]negative >>>>>> predictive value >>>>>> <https://en.wikipedia.org/wiki/Negative_predictive_value> (NPV)[image: >>>>>> \mathit{NPV} = \mathit{TN} / (\mathit{TN} + \mathit{FN})]fall-out >>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Fall-out> or false >>>>>> positive rate <https://en.wikipedia.org/wiki/False_positive_rate> >>>>>> (FPR)[image: \mathit{FPR} = \mathit{FP} / N = \mathit{FP} / >>>>>> (\mathit{FP} + \mathit{TN}) = 1-\mathit{SPC}]false negative rate >>>>>> <https://en.wikipedia.org/wiki/False_negative_rate> (FNR)[image: >>>>>> \mathit{FNR} = \mathit{FN} / (\mathit{TP} + \mathit{FN}) = >>>>>> 1-\mathit{TPR}]false >>>>>> discovery rate <https://en.wikipedia.org/wiki/False_discovery_rate> >>>>>> (FDR)[image: \mathit{FDR} = \mathit{FP} / (\mathit{TP} + >>>>>> \mathit{FP}) = 1 - \mathit{PPV}] >>>>>> ------------------------------ >>>>>> accuracy <https://en.wikipedia.org/wiki/Accuracy> (ACC)[image: >>>>>> \mathit{ACC} = (\mathit{TP} + \mathit{TN}) / (\mathit{TP} + \mathit{FP} + >>>>>> \mathit{FN} + \mathit{TN})]F1 score >>>>>> <https://en.wikipedia.org/wiki/F1_score>is the harmonic mean >>>>>> <https://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers> >>>>>> of precision >>>>>> <https://en.wikipedia.org/wiki/Information_retrieval#Precision> and >>>>>> sensitivity <https://en.wikipedia.org/wiki/Sensitivity_%28test%29>[image: >>>>>> \mathit{F1} = 2 \mathit{TP} / (2 \mathit{TP} + \mathit{FP} + >>>>>> \mathit{FN})]Matthews >>>>>> correlation coefficient >>>>>> <https://en.wikipedia.org/wiki/Matthews_correlation_coefficient> >>>>>> (MCC)[image: \frac{ \mathit{TP} \times \mathit{TN} - \mathit{FP} >>>>>> \times \mathit{FN} } {\sqrt{ (\mathit{TP}+\mathit{FP}) ( \mathit{TP} + >>>>>> \mathit{FN} ) ( \mathit{TN} + \mathit{FP} ) ( \mathit{TN} + \mathit{FN} >>>>>> ) } >>>>>> }]Informedness[image: \mathit{TPR} + \mathit{SPC} - 1]Markedness >>>>>> <https://en.wikipedia.org/wiki/Markedness>[image: \mathit{PPV} + >>>>>> \mathit{NPV} - 1] >>>>>> >>>>>> *Sources: Fawcett (2006) and Powers (2011).*[1] >>>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Fawcett2006-1> >>>>>> [2] >>>>>> <https://en.wikipedia.org/wiki/Sensitivity_and_specificity#cite_note-Powers2011-2> >>>>>> >>>>>> Thanks and Regards, >>>>>> Ashen >>>>>> -- >>>>>> *Ashen Weerathunga* >>>>>> Software Engineer - Intern >>>>>> WSO2 Inc.: http://wso2.com >>>>>> lean.enterprise.middleware >>>>>> >>>>>> Email: as...@wso2.com >>>>>> Mobile: +94 716042995 <94716042995> >>>>>> LinkedIn: >>>>>> *http://lk.linkedin.com/in/ashenweerathunga >>>>>> <http://lk.linkedin.com/in/ashenweerathunga>* >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> Sinnathamby Mahesan >>>>> >>>>> >>>>> >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> >>>> >>>> >>> >> >> >> -- >> *CD Athuraliya* >> Software Engineer >> WSO2, Inc. >> lean . enterprise . middleware >> Mobile: +94 716288847 <94716288847> >> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >> <https://twitter.com/cdathuraliya> | Blog >> <http://cdathuraliya.tumblr.com/> >> > > > > -- > ============================ > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev