Yes it seems. I've removed the code of WSSSE calculation since it's not used currently.
On Tue, Jul 14, 2015 at 11:56 AM, Maheshakya Wijewardena < [email protected]> wrote: > But again, WSSSE can (only) be useful when comparing the models of > different k values in k-means algorithm. > > On Tue, Jul 14, 2015 at 11:46 AM, Maheshakya Wijewardena < > [email protected]> wrote: > >> I'm not sure whether within-cluster sum of squared error would be a good >> metric for k-means since what k-means does in its' optimization is >> minimizing that error[1]. Therefore, the result we get will always be good >> according to that measure. I think an internal validation method[2] that >> does not depend on the same optimization technique of k-means would be more >> suitable. >> >> [1] https://en.wikipedia.org/wiki/K-means_clustering#Description >> [2] http://www.universitypress.org.uk/journals/cc/20-463.pdf >> >> On Mon, Jul 13, 2015 at 12:04 PM, Nirmal Fernando <[email protected]> >> wrote: >> >>> https://spark.apache.org/docs/latest/mllib-clustering.html#k-means >>> >>> On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]> >>> wrote: >>> >>>> Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure >>>> of clustering? >>>> >>>> >>>> On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> We have implemented model comparison for classification and numerical >>>>> prediction with following measures. >>>>> >>>>> - Binary and multiclass classification - Accuracy >>>>> - Numerical prediction - Mean squared error >>>>> >>>>> We are currently working on a sorted view of models according to their >>>>> accuracy/MSE. This release will not support cross comparison for >>>>> clustering >>>>> algorithms. >>>>> >>>>> Thanks, >>>>> CD >>>>> >>>>> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> With what chart types and implementations we are going to proceed for >>>>>> alpha? We will be able to finalize comparison and summery views with >>>>>> them. >>>>>> >>>>>> Thanks, >>>>>> CD >>>>>> >>>>>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Nirmal, >>>>>>> >>>>>>> During the last discussion, what we decided was to, show some >>>>>>> numerical value (Accuracy / Std error) next to each model to illustrate >>>>>>> the >>>>>>> performance in the model listing view, so that user can get an overall >>>>>>> idea >>>>>>> at one glance. And in a separate page, have the ROC comparison. Think we >>>>>>> still need to figure out where would the later fit in, in the UI >>>>>>> navigation.. >>>>>>> >>>>>>> Thanks, >>>>>>> Supun >>>>>>> >>>>>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks for summarizing Supun. Did we think about how we gonna >>>>>>>> create the cross-model comparisons view? >>>>>>>> >>>>>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> [-strategy@, +architecture@] >>>>>>>>> >>>>>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> should go to arch@ >>>>>>>>>> >>>>>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected] >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Supun!! this looks good. >>>>>>>>>>> >>>>>>>>>>> --Srinath >>>>>>>>>>> >>>>>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> Following is the break down of the Model Summary illustrations >>>>>>>>>>>> that can be supported by ML at the moment. Initiating this thread >>>>>>>>>>>> to >>>>>>>>>>>> finalize on what we can support and what cannot, with the initial >>>>>>>>>>>> release. >>>>>>>>>>>> Blue colored ones are yet to implement. >>>>>>>>>>>> >>>>>>>>>>>> - Numerical Prediction >>>>>>>>>>>> - Standard Error [1] >>>>>>>>>>>> - Residual Plot [2] >>>>>>>>>>>> - Feature Importance (*Graph containing weights assigned >>>>>>>>>>>> to each of the feature in the model*) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Classification: >>>>>>>>>>>> - Binary >>>>>>>>>>>> - ROC [3] >>>>>>>>>>>> - AUC >>>>>>>>>>>> - Confusion Matrix (*Available on spark as a >>>>>>>>>>>> static metric. But if this was calculated manually, it >>>>>>>>>>>> can be made >>>>>>>>>>>> interactive, so that user can find the optimal threshold* >>>>>>>>>>>> ) >>>>>>>>>>>> - Accuracy >>>>>>>>>>>> - Feature Importance >>>>>>>>>>>> - Multi-Class >>>>>>>>>>>> - Confusion Matrix (*Available on spark*) >>>>>>>>>>>> - Accuracy >>>>>>>>>>>> - Feature Importance >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Clustering >>>>>>>>>>>> - Scatter plot with clustered points >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Cross-comparing Models* >>>>>>>>>>>> >>>>>>>>>>>> As you can see, major limitation we have when cross comparing >>>>>>>>>>>> models within a project is, different categories have different >>>>>>>>>>>> summary >>>>>>>>>>>> statistics/plots, and hence we cannot compare two models in two >>>>>>>>>>>> categories. >>>>>>>>>>>> >>>>>>>>>>>> Following are the possibilities: >>>>>>>>>>>> >>>>>>>>>>>> - ROC can be used to compare Binary classification models. >>>>>>>>>>>> - Cobweb (a radar chart) can be used to compare Multi-Class >>>>>>>>>>>> classification models (This is the possible alternative for >>>>>>>>>>>> ROC in multi-class case. But the drawback is, the graph will be >>>>>>>>>>>> very >>>>>>>>>>>> unclear when there are excess amounts of features in the >>>>>>>>>>>> models). [4] [5] >>>>>>>>>>>> - Accuracy can be used to compare all classification models. >>>>>>>>>>>> >>>>>>>>>>>> Please add if I've missed anything. >>>>>>>>>>>> >>>>>>>>>>>> *Ref:* >>>>>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html >>>>>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx >>>>>>>>>>>> [3] >>>>>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X >>>>>>>>>>>> [4] >>>>>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data >>>>>>>>>>>> [5] >>>>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Supun >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> *Supun Sethunga* >>>>>>>>>>>> Software Engineer >>>>>>>>>>>> WSO2, Inc. >>>>>>>>>>>> http://wso2.com/ >>>>>>>>>>>> lean | enterprise | middleware >>>>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ============================ >>>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>>>>> Phone: 0772360902 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ============================ >>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>>>> Phone: 0772360902 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Supun Sethunga* >>>>>>>>> Software Engineer >>>>>>>>> WSO2, Inc. >>>>>>>>> http://wso2.com/ >>>>>>>>> lean | enterprise | middleware >>>>>>>>> Mobile : +94 716546324 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Thanks & regards, >>>>>>>> Nirmal >>>>>>>> >>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>> Mobile: +94715779733 >>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *CD Athuraliya* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> lean . enterprise . middleware >>>>>> Mobile: +94 716288847 <94716288847> >>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>> <http://cdathuraliya.tumblr.com/> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *CD Athuraliya* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> lean . enterprise . middleware >>>>> Mobile: +94 716288847 <94716288847> >>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>> <https://twitter.com/cdathuraliya> | Blog >>>>> <http://cdathuraliya.tumblr.com/> >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Thanks & regards, >>>> Nirmal >>>> >>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>> Mobile: +94715779733 >>>> Blog: http://nirmalfdo.blogspot.com/ >>>> >>>> >>>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Pruthuvi Maheshakya Wijewardena >> Software Engineer >> WSO2 : http://wso2.com/ >> Email: [email protected] >> Mobile: +94711228855 >> >> >> > > > -- > Pruthuvi Maheshakya Wijewardena > Software Engineer > WSO2 : http://wso2.com/ > Email: [email protected] > Mobile: +94711228855 > > > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Thanks & regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
