+1 shall we try those? On 26 May 2015 22:52, "Upul Bandara" <[email protected]> wrote:
> +1 for residual plots. > > Though I haven't used it myself Residual Plot is a useful diagnostic tool > for regression models. > Especially, non-linearity in regression models can be easily identified > using it. > > "An Introduction to Statistical Learning" book [1] ( page 92-96) contains > some useful information about residual plots. > > [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf > > On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <[email protected]> wrote: > >> Hi CD, >> >> As it pops up in the offline discussion as well, IMHO, for >> classifications, this plot may not be the best option. But for regression, >> we can actually use this plot but with a slight modification, that is >> taking the difference of the predicted and actual (rather than the values >> it self), and plot that, against a predictor variable (just like its been >> done atm). We can also add a third variable (categorical feature) to color >> the points. This is a standard plot (AKA Residual plot) which is usually >> use to evaluate regression models. >> >> One other thing we can try out is, doing the same for classification as >> well. i.e: Taking the difference between the actual probability (o or 1) >> and the predicted probability, and plot that, and see whether it gives a >> better overall picture. Not sure how will it come out though :) If it comes >> right, then any point lies above 0.5 (or the threshold we used) is wrongly >> classified, and hence we can get a rough idea, on for which values of >> x-axis feature, does the points get wrongly classified. I mean, we should >> be able to see any pattern, if there exists. >> >> Thanks, >> Supun >> >> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <[email protected]> >> wrote: >> >>> Hi, >>> >>> Plotting predicted and actual values against a feature doesn't look very >>> intuitive, specially for non-probabilistic models. Please check the >>> attachments. Any thoughts on making this visualization better? >>> >>> Thanks >>> >>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <[email protected]> >>> wrote: >>> >>>> yes, rerun using a random sample from test data is OK. >>>> >>>> --Srinath >>>> >>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <[email protected]> >>>> wrote: >>>> >>>>> Hi Srinath, >>>>> >>>>> Still that random sample will not correspond to predicted vs. actual >>>>> values in test results. Given that there is no mapping between random >>>>> sample data points and test result points. One thing we can do is running >>>>> test separately (using the same model) for sampled data for the sole >>>>> purpose of visualization. Any other options? >>>>> >>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi CD, >>>>>> >>>>>> Can we take a random sample from the test data and use that for this >>>>>> process? >>>>>> >>>>>> --Srianth >>>>>> >>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> To implement $subject in ML we need all feature values of the >>>>>>> dataset against predicted and actual values for test data. But Spark >>>>>>> only >>>>>>> returns predicted and actual values as test results. Right now we use >>>>>>> random 10,000 data rows for other visualizations and we cannot use same >>>>>>> data for this visualization since that random 10,000 data does not >>>>>>> correspond to test data (test data is a subtracted from dataset >>>>>>> according >>>>>>> to the train data fraction at model building stage). >>>>>>> >>>>>>> One option is to persist test data at testing stage, but it can be >>>>>>> too large for some datasets according to train data fraction. >>>>>>> Appreciate if >>>>>>> you can give your comments on this. >>>>>>> >>>>>>> Thanks, >>>>>>> CD >>>>>>> >>>>>>> -- >>>>>>> *CD Athuraliya* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> lean . enterprise . middleware >>>>>>> Mobile: +94 716288847 <94716288847> >>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>>>> <https://twitter.com/cdathuraliya> | Blog >>>>>>> <http://cdathuraliya.tumblr.com/> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ============================ >>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>> Site: http://people.apache.org/~hemapani/ >>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>> Phone: 0772360902 >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *CD Athuraliya* >>>>> Software Engineer >>>>> WSO2, Inc. >>>>> lean . enterprise . middleware >>>>> Mobile: +94 716288847 <94716288847> >>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>>>> <https://twitter.com/cdathuraliya> | Blog >>>>> <http://cdathuraliya.tumblr.com/> >>>>> >>>> >>>> >>>> >>>> -- >>>> ============================ >>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>> Site: http://people.apache.org/~hemapani/ >>>> Photos: http://www.flickr.com/photos/hemapani/ >>>> Phone: 0772360902 >>>> >>> >>> >>> >>> -- >>> *CD Athuraliya* >>> Software Engineer >>> WSO2, Inc. >>> lean . enterprise . middleware >>> Mobile: +94 716288847 <94716288847> >>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter >>> <https://twitter.com/cdathuraliya> | Blog >>> <http://cdathuraliya.tumblr.com/> >>> >> >> >> >> -- >> *Supun Sethunga* >> Software Engineer >> WSO2, Inc. >> http://wso2.com/ >> lean | enterprise | middleware >> Mobile : +94 716546324 >> > > > > -- > Upul Bandara, > Associate Technical Lead, WSO2, Inc., > Mob: +94 715 468 345. >
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
