I'm not sure whether within-cluster sum of squared error would be a good metric for k-means since what k-means does in its' optimization is minimizing that error[1]. Therefore, the result we get will always be good according to that measure. I think an internal validation method[2] that does not depend on the same optimization technique of k-means would be more suitable.
[1] https://en.wikipedia.org/wiki/K-means_clustering#Description [2] http://www.universitypress.org.uk/journals/cc/20-463.pdf On Mon, Jul 13, 2015 at 12:04 PM, Nirmal Fernando <[email protected]> wrote: > https://spark.apache.org/docs/latest/mllib-clustering.html#k-means > > On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]> wrote: > >> Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure of >> clustering? >> >> >> On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]> >> wrote: >> >>> Hi all, >>> >>> We have implemented model comparison for classification and numerical >>> prediction with following measures. >>> >>> - Binary and multiclass classification - Accuracy >>> - Numerical prediction - Mean squared error >>> >>> We are currently working on a sorted view of models according to their >>> accuracy/MSE. This release will not support cross comparison for clustering >>> algorithms. >>> >>> Thanks, >>> CD >>> >>> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> With what chart types and implementations we are going to proceed for >>>> alpha? We will be able to finalize comparison and summery views with them. >>>> >>>> Thanks, >>>> CD >>>> >>>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]> wrote: >>>> >>>>> Hi Nirmal, >>>>> >>>>> During the last discussion, what we decided was to, show some >>>>> numerical value (Accuracy / Std error) next to each model to illustrate >>>>> the >>>>> performance in the model listing view, so that user can get an overall >>>>> idea >>>>> at one glance. And in a separate page, have the ROC comparison. Think we >>>>> still need to figure out where would the later fit in, in the UI >>>>> navigation.. >>>>> >>>>> Thanks, >>>>> Supun >>>>> >>>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks for summarizing Supun. Did we think about how we gonna create >>>>>> the cross-model comparisons view? >>>>>> >>>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> [-strategy@, +architecture@] >>>>>>> >>>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> should go to arch@ >>>>>>>> >>>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks Supun!! this looks good. >>>>>>>>> >>>>>>>>> --Srinath >>>>>>>>> >>>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Following is the break down of the Model Summary illustrations >>>>>>>>>> that can be supported by ML at the moment. Initiating this thread to >>>>>>>>>> finalize on what we can support and what cannot, with the initial >>>>>>>>>> release. >>>>>>>>>> Blue colored ones are yet to implement. >>>>>>>>>> >>>>>>>>>> - Numerical Prediction >>>>>>>>>> - Standard Error [1] >>>>>>>>>> - Residual Plot [2] >>>>>>>>>> - Feature Importance (*Graph containing weights assigned >>>>>>>>>> to each of the feature in the model*) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - Classification: >>>>>>>>>> - Binary >>>>>>>>>> - ROC [3] >>>>>>>>>> - AUC >>>>>>>>>> - Confusion Matrix (*Available on spark as a >>>>>>>>>> static metric. But if this was calculated manually, it can >>>>>>>>>> be made >>>>>>>>>> interactive, so that user can find the optimal threshold* >>>>>>>>>> ) >>>>>>>>>> - Accuracy >>>>>>>>>> - Feature Importance >>>>>>>>>> - Multi-Class >>>>>>>>>> - Confusion Matrix (*Available on spark*) >>>>>>>>>> - Accuracy >>>>>>>>>> - Feature Importance >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - Clustering >>>>>>>>>> - Scatter plot with clustered points >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Cross-comparing Models* >>>>>>>>>> >>>>>>>>>> As you can see, major limitation we have when cross comparing >>>>>>>>>> models within a project is, different categories have different >>>>>>>>>> summary >>>>>>>>>> statistics/plots, and hence we cannot compare two models in two >>>>>>>>>> categories. >>>>>>>>>> >>>>>>>>>> Following are the possibilities: >>>>>>>>>> >>>>>>>>>> - ROC can be used to compare Binary classification models. >>>>>>>>>> - Cobweb (a radar chart) can be used to compare Multi-Class >>>>>>>>>> classification models (This is the possible alternative for >>>>>>>>>> ROC in multi-class case. But the drawback is, the graph will be >>>>>>>>>> very >>>>>>>>>> unclear when there are excess amounts of features in the models). >>>>>>>>>> [4] [5] >>>>>>>>>> - Accuracy can be used to compare all classification models. >>>>>>>>>> >>>>>>>>>> Please add if I've missed anything. >>>>>>>>>> >>>>>>>>>> *Ref:* >>>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html >>>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx >>>>>>>>>> [3] >>>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X >>>>>>>>>> [4] >>>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data >>>>>>>>>> [5] >>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Supun >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Supun Sethunga* >>>>>>>>>> Software Engineer >>>>>>>>>> WSO2, Inc. >>>>>>>>>> http://wso2.com/ >>>>>>>>>> lean | enterprise | middleware >>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ============================ >>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>>> Phone: 0772360902 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>> Phone: 0772360902 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Thanks & regards, >>>>>> Nirmal >>>>>> >>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>> Mobile: +94715779733 >>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Supun Sethunga* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> http://wso2.com/ >>>>> lean | enterprise | middleware >>>>> Mobile : +94 716546324 >>>>> >>>> >>>> >>>> >>>> -- >>>> *CD Athuraliya* >>>> Software Engineer >>>> WSO2, Inc. >>>> lean . enterprise . middleware >>>> Mobile: +94 716288847 <94716288847> >>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>> <https://twitter.com/cdathuraliya> | Blog >>>> <http://cdathuraliya.tumblr.com/> >>>> >>> >>> >>> >>> -- >>> *CD Athuraliya* >>> Software Engineer >>> WSO2, Inc. >>> lean . enterprise . middleware >>> Mobile: +94 716288847 <94716288847> >>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>> <https://twitter.com/cdathuraliya> | Blog >>> <http://cdathuraliya.tumblr.com/> >>> >> >> >> >> -- >> >> Thanks & regards, >> Nirmal >> >> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >> Mobile: +94715779733 >> Blog: http://nirmalfdo.blogspot.com/ >> >> >> > > > -- > > Thanks & regards, > Nirmal > > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Pruthuvi Maheshakya Wijewardena Software Engineer WSO2 : http://wso2.com/ Email: [email protected] Mobile: +94711228855
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
