But again, WSSSE can (only) be useful when comparing the models of different k values in k-means algorithm.
On Tue, Jul 14, 2015 at 11:46 AM, Maheshakya Wijewardena < [email protected]> wrote: > I'm not sure whether within-cluster sum of squared error would be a good > metric for k-means since what k-means does in its' optimization is > minimizing that error[1]. Therefore, the result we get will always be good > according to that measure. I think an internal validation method[2] that > does not depend on the same optimization technique of k-means would be more > suitable. > > [1] https://en.wikipedia.org/wiki/K-means_clustering#Description > [2] http://www.universitypress.org.uk/journals/cc/20-463.pdf > > On Mon, Jul 13, 2015 at 12:04 PM, Nirmal Fernando <[email protected]> wrote: > >> https://spark.apache.org/docs/latest/mllib-clustering.html#k-means >> >> On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]> >> wrote: >> >>> Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure of >>> clustering? >>> >>> >>> On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> We have implemented model comparison for classification and numerical >>>> prediction with following measures. >>>> >>>> - Binary and multiclass classification - Accuracy >>>> - Numerical prediction - Mean squared error >>>> >>>> We are currently working on a sorted view of models according to their >>>> accuracy/MSE. This release will not support cross comparison for clustering >>>> algorithms. >>>> >>>> Thanks, >>>> CD >>>> >>>> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> With what chart types and implementations we are going to proceed for >>>>> alpha? We will be able to finalize comparison and summery views with them. >>>>> >>>>> Thanks, >>>>> CD >>>>> >>>>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Nirmal, >>>>>> >>>>>> During the last discussion, what we decided was to, show some >>>>>> numerical value (Accuracy / Std error) next to each model to illustrate >>>>>> the >>>>>> performance in the model listing view, so that user can get an overall >>>>>> idea >>>>>> at one glance. And in a separate page, have the ROC comparison. Think we >>>>>> still need to figure out where would the later fit in, in the UI >>>>>> navigation.. >>>>>> >>>>>> Thanks, >>>>>> Supun >>>>>> >>>>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks for summarizing Supun. Did we think about how we gonna create >>>>>>> the cross-model comparisons view? >>>>>>> >>>>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> [-strategy@, +architecture@] >>>>>>>> >>>>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> should go to arch@ >>>>>>>>> >>>>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Supun!! this looks good. >>>>>>>>>> >>>>>>>>>> --Srinath >>>>>>>>>> >>>>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Following is the break down of the Model Summary illustrations >>>>>>>>>>> that can be supported by ML at the moment. Initiating this thread to >>>>>>>>>>> finalize on what we can support and what cannot, with the initial >>>>>>>>>>> release. >>>>>>>>>>> Blue colored ones are yet to implement. >>>>>>>>>>> >>>>>>>>>>> - Numerical Prediction >>>>>>>>>>> - Standard Error [1] >>>>>>>>>>> - Residual Plot [2] >>>>>>>>>>> - Feature Importance (*Graph containing weights assigned >>>>>>>>>>> to each of the feature in the model*) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Classification: >>>>>>>>>>> - Binary >>>>>>>>>>> - ROC [3] >>>>>>>>>>> - AUC >>>>>>>>>>> - Confusion Matrix (*Available on spark as a >>>>>>>>>>> static metric. But if this was calculated manually, it can >>>>>>>>>>> be made >>>>>>>>>>> interactive, so that user can find the optimal threshold* >>>>>>>>>>> ) >>>>>>>>>>> - Accuracy >>>>>>>>>>> - Feature Importance >>>>>>>>>>> - Multi-Class >>>>>>>>>>> - Confusion Matrix (*Available on spark*) >>>>>>>>>>> - Accuracy >>>>>>>>>>> - Feature Importance >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Clustering >>>>>>>>>>> - Scatter plot with clustered points >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *Cross-comparing Models* >>>>>>>>>>> >>>>>>>>>>> As you can see, major limitation we have when cross comparing >>>>>>>>>>> models within a project is, different categories have different >>>>>>>>>>> summary >>>>>>>>>>> statistics/plots, and hence we cannot compare two models in two >>>>>>>>>>> categories. >>>>>>>>>>> >>>>>>>>>>> Following are the possibilities: >>>>>>>>>>> >>>>>>>>>>> - ROC can be used to compare Binary classification models. >>>>>>>>>>> - Cobweb (a radar chart) can be used to compare Multi-Class >>>>>>>>>>> classification models (This is the possible alternative for >>>>>>>>>>> ROC in multi-class case. But the drawback is, the graph will be >>>>>>>>>>> very >>>>>>>>>>> unclear when there are excess amounts of features in the >>>>>>>>>>> models). [4] [5] >>>>>>>>>>> - Accuracy can be used to compare all classification models. >>>>>>>>>>> >>>>>>>>>>> Please add if I've missed anything. >>>>>>>>>>> >>>>>>>>>>> *Ref:* >>>>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html >>>>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx >>>>>>>>>>> [3] >>>>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X >>>>>>>>>>> [4] >>>>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data >>>>>>>>>>> [5] >>>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Supun >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> *Supun Sethunga* >>>>>>>>>>> Software Engineer >>>>>>>>>>> WSO2, Inc. >>>>>>>>>>> http://wso2.com/ >>>>>>>>>>> lean | enterprise | middleware >>>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ============================ >>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>>>> Phone: 0772360902 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ============================ >>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>>> Phone: 0772360902 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Supun Sethunga* >>>>>>>> Software Engineer >>>>>>>> WSO2, Inc. >>>>>>>> http://wso2.com/ >>>>>>>> lean | enterprise | middleware >>>>>>>> Mobile : +94 716546324 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Thanks & regards, >>>>>>> Nirmal >>>>>>> >>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>> Mobile: +94715779733 >>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Supun Sethunga* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> http://wso2.com/ >>>>>> lean | enterprise | middleware >>>>>> Mobile : +94 716546324 >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *CD Athuraliya* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> lean . enterprise . middleware >>>>> Mobile: +94 716288847 <94716288847> >>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>> <https://twitter.com/cdathuraliya> | Blog >>>>> <http://cdathuraliya.tumblr.com/> >>>>> >>>> >>>> >>>> >>>> -- >>>> *CD Athuraliya* >>>> Software Engineer >>>> WSO2, Inc. >>>> lean . enterprise . middleware >>>> Mobile: +94 716288847 <94716288847> >>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>> <https://twitter.com/cdathuraliya> | Blog >>>> <http://cdathuraliya.tumblr.com/> >>>> >>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> >> >> -- >> >> Thanks & regards, >> Nirmal >> >> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >> Mobile: +94715779733 >> Blog: http://nirmalfdo.blogspot.com/ >> >> >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Pruthuvi Maheshakya Wijewardena > Software Engineer > WSO2 : http://wso2.com/ > Email: [email protected] > Mobile: +94711228855 > > > -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 : http://wso2.com/ Email: [email protected] Mobile: +94711228855
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
