Re: [Dev] [ML] Predicted vs. actuals chart in model summary

Srinath Perera Tue, 26 May 2015 17:09:07 -0700

+1 shall we try those?
On 26 May 2015 22:52, "Upul Bandara" <[email protected]> wrote:


> +1 for residual plots.
>
> Though I haven't used it myself Residual Plot  is a useful diagnostic tool
> for regression models.
> Especially, non-linearity in regression models can be easily identified
> using it.
>
> "An Introduction to Statistical Learning" book [1] ( page 92-96) contains
> some useful information about residual plots.
>
> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf
>
> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <[email protected]> wrote:
>
>> Hi CD,
>>
>> As it pops up in the offline discussion as well, IMHO, for
>> classifications, this plot may not be the best option. But for regression,
>> we can actually use this plot but with a slight modification, that is
>> taking the difference of the predicted and actual (rather than the values
>> it self), and plot that, against a predictor variable (just like its been
>> done atm). We can also add a third variable (categorical feature) to color
>> the points. This is a standard plot (AKA Residual plot) which is usually
>> use to evaluate regression models.
>>
>> One other thing we can try out is, doing the same for classification as
>> well. i.e: Taking the difference between the actual probability (o or 1)
>> and the predicted probability, and plot that, and see whether it gives a
>> better overall picture. Not sure how will it come out though :) If it comes
>> right, then any point lies above 0.5 (or the threshold we used) is wrongly
>> classified, and hence we can get a rough idea, on for which values of
>> x-axis feature, does the points get wrongly classified. I mean, we should
>> be able to see any pattern, if there exists.
>>
>> Thanks,
>> Supun
>>
>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> Plotting predicted and actual values against a feature doesn't look very
>>> intuitive, specially for non-probabilistic models. Please check the
>>> attachments. Any thoughts on making this visualization better?
>>>
>>> Thanks
>>>
>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <[email protected]>
>>> wrote:
>>>
>>>> yes, rerun using a random sample from test data is OK.
>>>>
>>>> --Srinath
>>>>
>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Srinath,
>>>>>
>>>>> Still that random sample will not correspond to predicted vs. actual
>>>>> values in test results. Given that there is no mapping between random
>>>>> sample data points and test result points. One thing we can do is running
>>>>> test separately (using the same model) for sampled data for the sole
>>>>> purpose of visualization. Any other options?
>>>>>
>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi CD,
>>>>>>
>>>>>> Can we take a random sample from the test data and use that for this
>>>>>> process?
>>>>>>
>>>>>> --Srianth
>>>>>>
>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> To implement $subject in ML we need all feature values of the
>>>>>>> dataset against predicted and actual values for test data. But Spark 
>>>>>>> only
>>>>>>> returns predicted and actual values as test results. Right now we use
>>>>>>> random 10,000 data rows for other visualizations and we cannot use same
>>>>>>> data for this visualization since that random 10,000 data does not
>>>>>>> correspond to test data (test data is a subtracted from dataset 
>>>>>>> according
>>>>>>> to the train data fraction at model building stage).
>>>>>>>
>>>>>>> One option is to persist test data at testing stage, but it can be
>>>>>>> too large for some datasets according to train data fraction. 
>>>>>>> Appreciate if
>>>>>>> you can give your comments on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> CD
>>>>>>>
>>>>>>> --
>>>>>>> *CD Athuraliya*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> lean . enterprise . middleware
>>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ============================
>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>> Phone: 0772360902
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *CD Athuraliya*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> lean . enterprise . middleware
>>>>> Mobile: +94 716288847 <94716288847>
>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ============================
>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>> Site: http://people.apache.org/~hemapani/
>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>> Phone: 0772360902
>>>>
>>>
>>>
>>>
>>> --
>>> *CD Athuraliya*
>>> Software Engineer
>>> WSO2, Inc.
>>> lean . enterprise . middleware
>>> Mobile: +94 716288847 <94716288847>
>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>> <https://twitter.com/cdathuraliya> | Blog
>>> <http://cdathuraliya.tumblr.com/>
>>>
>>
>>
>>
>> --
>> *Supun Sethunga*
>> Software Engineer
>> WSO2, Inc.
>> http://wso2.com/
>> lean | enterprise | middleware
>> Mobile : +94 716546324
>>
>
>
>
> --
> Upul Bandara,
> Associate Technical Lead, WSO2, Inc.,
> Mob: +94 715 468 345.
>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Predicted vs. actuals chart in model summary

Reply via email to