https://spark.apache.org/docs/latest/mllib-clustering.html#k-means
On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]> wrote: > Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure of > clustering? > > > On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]> > wrote: > >> Hi all, >> >> We have implemented model comparison for classification and numerical >> prediction with following measures. >> >> - Binary and multiclass classification - Accuracy >> - Numerical prediction - Mean squared error >> >> We are currently working on a sorted view of models according to their >> accuracy/MSE. This release will not support cross comparison for clustering >> algorithms. >> >> Thanks, >> CD >> >> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]> >> wrote: >> >>> Hi all, >>> >>> With what chart types and implementations we are going to proceed for >>> alpha? We will be able to finalize comparison and summery views with them. >>> >>> Thanks, >>> CD >>> >>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]> wrote: >>> >>>> Hi Nirmal, >>>> >>>> During the last discussion, what we decided was to, show some numerical >>>> value (Accuracy / Std error) next to each model to illustrate the >>>> performance in the model listing view, so that user can get an overall idea >>>> at one glance. And in a separate page, have the ROC comparison. Think we >>>> still need to figure out where would the later fit in, in the UI >>>> navigation.. >>>> >>>> Thanks, >>>> Supun >>>> >>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]> >>>> wrote: >>>> >>>>> Thanks for summarizing Supun. Did we think about how we gonna create >>>>> the cross-model comparisons view? >>>>> >>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]> >>>>> wrote: >>>>> >>>>>> [-strategy@, +architecture@] >>>>>> >>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> should go to arch@ >>>>>>> >>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Supun!! this looks good. >>>>>>>> >>>>>>>> --Srinath >>>>>>>> >>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> Following is the break down of the Model Summary illustrations >>>>>>>>> that can be supported by ML at the moment. Initiating this thread to >>>>>>>>> finalize on what we can support and what cannot, with the initial >>>>>>>>> release. >>>>>>>>> Blue colored ones are yet to implement. >>>>>>>>> >>>>>>>>> - Numerical Prediction >>>>>>>>> - Standard Error [1] >>>>>>>>> - Residual Plot [2] >>>>>>>>> - Feature Importance (*Graph containing weights assigned to >>>>>>>>> each of the feature in the model*) >>>>>>>>> >>>>>>>>> >>>>>>>>> - Classification: >>>>>>>>> - Binary >>>>>>>>> - ROC [3] >>>>>>>>> - AUC >>>>>>>>> - Confusion Matrix (*Available on spark as a >>>>>>>>> static metric. But if this was calculated manually, it can >>>>>>>>> be made >>>>>>>>> interactive, so that user can find the optimal threshold* >>>>>>>>> ) >>>>>>>>> - Accuracy >>>>>>>>> - Feature Importance >>>>>>>>> - Multi-Class >>>>>>>>> - Confusion Matrix (*Available on spark*) >>>>>>>>> - Accuracy >>>>>>>>> - Feature Importance >>>>>>>>> >>>>>>>>> >>>>>>>>> - Clustering >>>>>>>>> - Scatter plot with clustered points >>>>>>>>> >>>>>>>>> >>>>>>>>> *Cross-comparing Models* >>>>>>>>> >>>>>>>>> As you can see, major limitation we have when cross comparing >>>>>>>>> models within a project is, different categories have different >>>>>>>>> summary >>>>>>>>> statistics/plots, and hence we cannot compare two models in two >>>>>>>>> categories. >>>>>>>>> >>>>>>>>> Following are the possibilities: >>>>>>>>> >>>>>>>>> - ROC can be used to compare Binary classification models. >>>>>>>>> - Cobweb (a radar chart) can be used to compare Multi-Class >>>>>>>>> classification models (This is the possible alternative for >>>>>>>>> ROC in multi-class case. But the drawback is, the graph will be >>>>>>>>> very >>>>>>>>> unclear when there are excess amounts of features in the models). >>>>>>>>> [4] [5] >>>>>>>>> - Accuracy can be used to compare all classification models. >>>>>>>>> >>>>>>>>> Please add if I've missed anything. >>>>>>>>> >>>>>>>>> *Ref:* >>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html >>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx >>>>>>>>> [3] >>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X >>>>>>>>> [4] >>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data >>>>>>>>> [5] >>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Supun >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Supun Sethunga* >>>>>>>>> Software Engineer >>>>>>>>> WSO2, Inc. >>>>>>>>> http://wso2.com/ >>>>>>>>> lean | enterprise | middleware >>>>>>>>> Mobile : +94 716546324 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>> Phone: 0772360902 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ============================ >>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>> Phone: 0772360902 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Supun Sethunga* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> http://wso2.com/ >>>>>> lean | enterprise | middleware >>>>>> Mobile : +94 716546324 >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Thanks & regards, >>>>> Nirmal >>>>> >>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>> Mobile: +94715779733 >>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> >>> >>> >>> >>> -- >>> *CD Athuraliya* >>> Software Engineer >>> WSO2, Inc. >>> lean . enterprise . middleware >>> Mobile: +94 716288847 <94716288847> >>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>> <https://twitter.com/cdathuraliya> | Blog >>> <http://cdathuraliya.tumblr.com/> >>> >> >> >> >> -- >> *CD Athuraliya* >> Software Engineer >> WSO2, Inc. >> lean . enterprise . middleware >> Mobile: +94 716288847 <94716288847> >> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >> <https://twitter.com/cdathuraliya> | Blog >> <http://cdathuraliya.tumblr.com/> >> > > > > -- > > Thanks & regards, > Nirmal > > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > > -- Thanks & regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
