Hi Maheshakya, We'll be adding cluster diagram in model summary for clustering algorithms. Please suggest if there exist any other useful evaluation metrics.
Thanks On Thu, May 28, 2015 at 11:58 AM, Maheshakya Wijewardena < [email protected]> wrote: > Nice. > > Adding up to charts for classification, I think we need some visualization > method for clustering as well since there's nothing to show after > clustering models are trained. Maybe chart with respect to two selected > attributes. > > On Thu, May 28, 2015 at 11:46 AM, CD Athuraliya <[email protected]> > wrote: > >> Hi all, >> >> Residual plot has been added for numerical prediction algorithms. Using >> standard chart types as much as possible is better IMO. It will reduce user >> confusion in understanding visualizations. I think we need to look for some >> standard chart types for classification algorithms (both binary and >> multiclass) as well [1]. >> >> [1] http://oobaloo.co.uk/visualising-classifier-results-with-ggplot2 >> >> Thanks >> >> On Wed, May 27, 2015 at 5:38 AM, Srinath Perera <[email protected]> wrote: >> >>> +1 shall we try those? >>> On 26 May 2015 22:52, "Upul Bandara" <[email protected]> wrote: >>> >>>> +1 for residual plots. >>>> >>>> Though I haven't used it myself Residual Plot is a useful diagnostic >>>> tool for regression models. >>>> Especially, non-linearity in regression models can be easily identified >>>> using it. >>>> >>>> "An Introduction to Statistical Learning" book [1] ( page 92-96) >>>> contains some useful information about residual plots. >>>> >>>> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf >>>> >>>> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <[email protected]> >>>> wrote: >>>> >>>>> Hi CD, >>>>> >>>>> As it pops up in the offline discussion as well, IMHO, for >>>>> classifications, this plot may not be the best option. But for regression, >>>>> we can actually use this plot but with a slight modification, that is >>>>> taking the difference of the predicted and actual (rather than the values >>>>> it self), and plot that, against a predictor variable (just like its been >>>>> done atm). We can also add a third variable (categorical feature) to color >>>>> the points. This is a standard plot (AKA Residual plot) which is usually >>>>> use to evaluate regression models. >>>>> >>>>> One other thing we can try out is, doing the same for classification >>>>> as well. i.e: Taking the difference between the actual probability (o or >>>>> 1) >>>>> and the predicted probability, and plot that, and see whether it gives a >>>>> better overall picture. Not sure how will it come out though :) If it >>>>> comes >>>>> right, then any point lies above 0.5 (or the threshold we used) is wrongly >>>>> classified, and hence we can get a rough idea, on for which values of >>>>> x-axis feature, does the points get wrongly classified. I mean, we should >>>>> be able to see any pattern, if there exists. >>>>> >>>>> Thanks, >>>>> Supun >>>>> >>>>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Plotting predicted and actual values against a feature doesn't look >>>>>> very intuitive, specially for non-probabilistic models. Please check the >>>>>> attachments. Any thoughts on making this visualization better? >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> yes, rerun using a random sample from test data is OK. >>>>>>> >>>>>>> --Srinath >>>>>>> >>>>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Srinath, >>>>>>>> >>>>>>>> Still that random sample will not correspond to predicted vs. >>>>>>>> actual values in test results. Given that there is no mapping between >>>>>>>> random sample data points and test result points. One thing we can do >>>>>>>> is >>>>>>>> running test separately (using the same model) for sampled data for the >>>>>>>> sole purpose of visualization. Any other options? >>>>>>>> >>>>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi CD, >>>>>>>>> >>>>>>>>> Can we take a random sample from the test data and use that for >>>>>>>>> this process? >>>>>>>>> >>>>>>>>> --Srianth >>>>>>>>> >>>>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> To implement $subject in ML we need all feature values of the >>>>>>>>>> dataset against predicted and actual values for test data. But Spark >>>>>>>>>> only >>>>>>>>>> returns predicted and actual values as test results. Right now we use >>>>>>>>>> random 10,000 data rows for other visualizations and we cannot use >>>>>>>>>> same >>>>>>>>>> data for this visualization since that random 10,000 data does not >>>>>>>>>> correspond to test data (test data is a subtracted from dataset >>>>>>>>>> according >>>>>>>>>> to the train data fraction at model building stage). >>>>>>>>>> >>>>>>>>>> One option is to persist test data at testing stage, but it can >>>>>>>>>> be too large for some datasets according to train data fraction. >>>>>>>>>> Appreciate >>>>>>>>>> if you can give your comments on this. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> CD >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *CD Athuraliya* >>>>>>>>>> Software Engineer >>>>>>>>>> WSO2, Inc. >>>>>>>>>> lean . enterprise . middleware >>>>>>>>>> Mobile: +94 716288847 <94716288847> >>>>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>>>>>> <http://cdathuraliya.tumblr.com/> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ============================ >>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>>> Phone: 0772360902 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *CD Athuraliya* >>>>>>>> Software Engineer >>>>>>>> WSO2, Inc. >>>>>>>> lean . enterprise . middleware >>>>>>>> Mobile: +94 716288847 <94716288847> >>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>>>> <http://cdathuraliya.tumblr.com/> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ============================ >>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>> Phone: 0772360902 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *CD Athuraliya* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> lean . enterprise . middleware >>>>>> Mobile: +94 716288847 <94716288847> >>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>> <http://cdathuraliya.tumblr.com/> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Supun Sethunga* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> http://wso2.com/ >>>>> lean | enterprise | middleware >>>>> Mobile : +94 716546324 >>>>> >>>> >>>> >>>> >>>> -- >>>> Upul Bandara, >>>> Associate Technical Lead, WSO2, Inc., >>>> Mob: +94 715 468 345. >>>> >>> >> >> >> -- >> *CD Athuraliya* >> Software Engineer >> WSO2, Inc. >> lean . enterprise . middleware >> Mobile: +94 716288847 <94716288847> >> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >> <https://twitter.com/cdathuraliya> | Blog >> <http://cdathuraliya.tumblr.com/> >> > > > > -- > Pruthuvi Maheshakya Wijewardena > Software Engineer > WSO2 Lanka (Pvt) Ltd > Email: [email protected] > Mobile: +94711228855 > > > -- *CD Athuraliya* Software Engineer WSO2, Inc. lean . enterprise . middleware Mobile: +94 716288847 <94716288847> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter <https://twitter.com/cdathuraliya> | Blog <http://cdathuraliya.tumblr.com/>
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
