Hi Maheshakya,

We'll be adding cluster diagram in model summary for clustering algorithms.
Please suggest if there exist any other useful evaluation metrics.

Thanks

On Thu, May 28, 2015 at 11:58 AM, Maheshakya Wijewardena <
[email protected]> wrote:

> Nice.
>
> Adding up to charts for classification, I think we need some visualization
> method for clustering as well since there's nothing to show after
> clustering models are trained. Maybe chart with respect to two selected
> attributes.
>
> On Thu, May 28, 2015 at 11:46 AM, CD Athuraliya <[email protected]>
> wrote:
>
>> Hi all,
>>
>> Residual plot has been added for numerical prediction algorithms. Using
>> standard chart types as much as possible is better IMO. It will reduce user
>> confusion in understanding visualizations. I think we need to look for some
>> standard chart types for classification algorithms (both binary and
>> multiclass) as well [1].
>>
>> [1] http://oobaloo.co.uk/visualising-classifier-results-with-ggplot2
>>
>> Thanks
>>
>> On Wed, May 27, 2015 at 5:38 AM, Srinath Perera <[email protected]> wrote:
>>
>>> +1 shall we try those?
>>> On 26 May 2015 22:52, "Upul Bandara" <[email protected]> wrote:
>>>
>>>> +1 for residual plots.
>>>>
>>>> Though I haven't used it myself Residual Plot  is a useful diagnostic
>>>> tool for regression models.
>>>> Especially, non-linearity in regression models can be easily identified
>>>> using it.
>>>>
>>>> "An Introduction to Statistical Learning" book [1] ( page 92-96)
>>>> contains some useful information about residual plots.
>>>>
>>>> [1]. http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf
>>>>
>>>> On Tue, May 26, 2015 at 8:47 PM, Supun Sethunga <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi CD,
>>>>>
>>>>> As it pops up in the offline discussion as well, IMHO, for
>>>>> classifications, this plot may not be the best option. But for regression,
>>>>> we can actually use this plot but with a slight modification, that is
>>>>> taking the difference of the predicted and actual (rather than the values
>>>>> it self), and plot that, against a predictor variable (just like its been
>>>>> done atm). We can also add a third variable (categorical feature) to color
>>>>> the points. This is a standard plot (AKA Residual plot) which is usually
>>>>> use to evaluate regression models.
>>>>>
>>>>> One other thing we can try out is, doing the same for classification
>>>>> as well. i.e: Taking the difference between the actual probability (o or 
>>>>> 1)
>>>>> and the predicted probability, and plot that, and see whether it gives a
>>>>> better overall picture. Not sure how will it come out though :) If it 
>>>>> comes
>>>>> right, then any point lies above 0.5 (or the threshold we used) is wrongly
>>>>> classified, and hence we can get a rough idea, on for which values of
>>>>> x-axis feature, does the points get wrongly classified. I mean, we should
>>>>> be able to see any pattern, if there exists.
>>>>>
>>>>> Thanks,
>>>>> Supun
>>>>>
>>>>> On Tue, May 26, 2015 at 6:08 PM, CD Athuraliya <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Plotting predicted and actual values against a feature doesn't look
>>>>>> very intuitive, specially for non-probabilistic models. Please check the
>>>>>> attachments. Any thoughts on making this visualization better?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Fri, May 22, 2015 at 3:27 PM, Srinath Perera <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> yes, rerun using a random sample from test data is OK.
>>>>>>>
>>>>>>> --Srinath
>>>>>>>
>>>>>>> On Fri, May 22, 2015 at 2:28 PM, CD Athuraliya <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Srinath,
>>>>>>>>
>>>>>>>> Still that random sample will not correspond to predicted vs.
>>>>>>>> actual values in test results. Given that there is no mapping between
>>>>>>>> random sample data points and test result points. One thing we can do 
>>>>>>>> is
>>>>>>>> running test separately (using the same model) for sampled data for the
>>>>>>>> sole purpose of visualization. Any other options?
>>>>>>>>
>>>>>>>> On Fri, May 22, 2015 at 2:06 PM, Srinath Perera <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi CD,
>>>>>>>>>
>>>>>>>>> Can we take a random sample from the test data and use that for
>>>>>>>>> this process?
>>>>>>>>>
>>>>>>>>> --Srianth
>>>>>>>>>
>>>>>>>>> On Fri, May 22, 2015 at 12:00 PM, CD Athuraliya <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> To implement $subject in ML we need all feature values of the
>>>>>>>>>> dataset against predicted and actual values for test data. But Spark 
>>>>>>>>>> only
>>>>>>>>>> returns predicted and actual values as test results. Right now we use
>>>>>>>>>> random 10,000 data rows for other visualizations and we cannot use 
>>>>>>>>>> same
>>>>>>>>>> data for this visualization since that random 10,000 data does not
>>>>>>>>>> correspond to test data (test data is a subtracted from dataset 
>>>>>>>>>> according
>>>>>>>>>> to the train data fraction at model building stage).
>>>>>>>>>>
>>>>>>>>>> One option is to persist test data at testing stage, but it can
>>>>>>>>>> be too large for some datasets according to train data fraction. 
>>>>>>>>>> Appreciate
>>>>>>>>>> if you can give your comments on this.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> CD
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *CD Athuraliya*
>>>>>>>>>> Software Engineer
>>>>>>>>>> WSO2, Inc.
>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ============================
>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>> Phone: 0772360902
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *CD Athuraliya*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> lean . enterprise . middleware
>>>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ============================
>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>> Phone: 0772360902
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *CD Athuraliya*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> lean . enterprise . middleware
>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Supun Sethunga*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> http://wso2.com/
>>>>> lean | enterprise | middleware
>>>>> Mobile : +94 716546324
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Upul Bandara,
>>>> Associate Technical Lead, WSO2, Inc.,
>>>> Mob: +94 715 468 345.
>>>>
>>>
>>
>>
>> --
>> *CD Athuraliya*
>> Software Engineer
>> WSO2, Inc.
>> lean . enterprise . middleware
>> Mobile: +94 716288847 <94716288847>
>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>> <https://twitter.com/cdathuraliya> | Blog
>> <http://cdathuraliya.tumblr.com/>
>>
>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 Lanka (Pvt) Ltd
> Email: [email protected]
> Mobile: +94711228855
>
>
>


-- 
*CD Athuraliya*
Software Engineer
WSO2, Inc.
lean . enterprise . middleware
Mobile: +94 716288847 <94716288847>
LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
<https://twitter.com/cdathuraliya> | Blog <http://cdathuraliya.tumblr.com/>
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to