Re: [Architecture] ML Model Summary Illustration and Comparison

Maheshakya Wijewardena Mon, 13 Jul 2015 23:27:07 -0700

But again, WSSSE can (only) be useful when comparing the models of
different k values in k-means algorithm.


On Tue, Jul 14, 2015 at 11:46 AM, Maheshakya Wijewardena <
[email protected]> wrote:

> I'm not sure whether within-cluster sum of squared error would be a good
> metric for k-means since what k-means does in its' optimization is
> minimizing that error[1]. Therefore, the result we get will always be good
> according to that measure. I think an internal validation method[2] that
> does not depend on the same optimization technique of k-means would be more
> suitable.
>
> [1] https://en.wikipedia.org/wiki/K-means_clustering#Description
> [2] http://www.universitypress.org.uk/journals/cc/20-463.pdf
>
> On Mon, Jul 13, 2015 at 12:04 PM, Nirmal Fernando <[email protected]> wrote:
>
>> https://spark.apache.org/docs/latest/mllib-clustering.html#k-means
>>
>> On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]>
>> wrote:
>>
>>> Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure of
>>> clustering?
>>>
>>>
>>> On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We have implemented model comparison for classification and numerical
>>>> prediction with following measures.
>>>>
>>>>    - Binary and multiclass classification - Accuracy
>>>>    - Numerical prediction - Mean squared error
>>>>
>>>> We are currently working on a sorted view of models according to their
>>>> accuracy/MSE. This release will not support cross comparison for clustering
>>>> algorithms.
>>>>
>>>> Thanks,
>>>> CD
>>>>
>>>> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> With what chart types and implementations we are going to proceed for
>>>>> alpha? We will be able to finalize comparison and summery views with them.
>>>>>
>>>>> Thanks,
>>>>> CD
>>>>>
>>>>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Nirmal,
>>>>>>
>>>>>> During the last discussion, what we decided was to, show some
>>>>>> numerical value (Accuracy / Std error) next to each model to illustrate 
>>>>>> the
>>>>>> performance in the model listing view, so that user can get an overall 
>>>>>> idea
>>>>>> at one glance. And in a separate page, have the ROC comparison. Think we
>>>>>> still need to figure out where would the later fit in, in the UI
>>>>>> navigation..
>>>>>>
>>>>>> Thanks,
>>>>>> Supun
>>>>>>
>>>>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for summarizing Supun. Did we think about how we gonna create
>>>>>>> the cross-model comparisons view?
>>>>>>>
>>>>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> [-strategy@, +architecture@]
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> should go to arch@
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Supun!! this looks good.
>>>>>>>>>>
>>>>>>>>>> --Srinath
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Following is the break down of the Model Summary illustrations
>>>>>>>>>>> that can be supported by ML at the moment. Initiating this thread to
>>>>>>>>>>> finalize on what we can support and what cannot, with the initial 
>>>>>>>>>>> release.
>>>>>>>>>>> Blue colored ones are yet to implement.
>>>>>>>>>>>
>>>>>>>>>>>    - Numerical Prediction
>>>>>>>>>>>       - Standard Error [1]
>>>>>>>>>>>       - Residual Plot [2]
>>>>>>>>>>>       - Feature Importance (*Graph containing weights assigned
>>>>>>>>>>>       to each of the feature in the model*)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Classification:
>>>>>>>>>>>    - Binary
>>>>>>>>>>>       - ROC [3]
>>>>>>>>>>>          - AUC
>>>>>>>>>>>          - Confusion Matrix (*Available on spark as a
>>>>>>>>>>>          static metric. But if this was calculated manually, it can 
>>>>>>>>>>> be made
>>>>>>>>>>>          interactive, so that user can find the optimal threshold*
>>>>>>>>>>>          )
>>>>>>>>>>>          - Accuracy
>>>>>>>>>>>          - Feature Importance
>>>>>>>>>>>       - Multi-Class
>>>>>>>>>>>          - Confusion Matrix (*Available on spark*)
>>>>>>>>>>>          - Accuracy
>>>>>>>>>>>          - Feature Importance
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    - Clustering
>>>>>>>>>>>       - Scatter plot with clustered points
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Cross-comparing Models*
>>>>>>>>>>>
>>>>>>>>>>> As you can see, major limitation we have when cross comparing
>>>>>>>>>>> models within a project is, different categories have different 
>>>>>>>>>>> summary
>>>>>>>>>>> statistics/plots, and hence we cannot compare two models in two 
>>>>>>>>>>> categories.
>>>>>>>>>>>
>>>>>>>>>>> Following are the possibilities:
>>>>>>>>>>>
>>>>>>>>>>>    - ROC can be used to compare Binary classification models.
>>>>>>>>>>>    - Cobweb (a radar chart) can be used to compare Multi-Class
>>>>>>>>>>>    classification models (This is the possible alternative for
>>>>>>>>>>>    ROC in multi-class case. But the drawback is, the graph will be 
>>>>>>>>>>> very
>>>>>>>>>>>    unclear when there are excess amounts of features in the 
>>>>>>>>>>> models). [4] [5]
>>>>>>>>>>>    - Accuracy can be used to compare all classification models.
>>>>>>>>>>>
>>>>>>>>>>> Please add if I've missed anything.
>>>>>>>>>>>
>>>>>>>>>>> *Ref:*
>>>>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html
>>>>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx
>>>>>>>>>>> [3]
>>>>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X
>>>>>>>>>>> [4]
>>>>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data
>>>>>>>>>>> [5]
>>>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Supun
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>> Software Engineer
>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ============================
>>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>>> Phone: 0772360902
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ============================
>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>> Phone: 0772360902
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Supun Sethunga*
>>>>>>>> Software Engineer
>>>>>>>> WSO2, Inc.
>>>>>>>> http://wso2.com/
>>>>>>>> lean | enterprise | middleware
>>>>>>>> Mobile : +94 716546324
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Thanks & regards,
>>>>>>> Nirmal
>>>>>>>
>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>> Mobile: +94715779733
>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *CD Athuraliya*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> lean . enterprise . middleware
>>>>> Mobile: +94 716288847 <94716288847>
>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *CD Athuraliya*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> lean . enterprise . middleware
>>>> Mobile: +94 716288847 <94716288847>
>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>> <https://twitter.com/cdathuraliya> | Blog
>>>> <http://cdathuraliya.tumblr.com/>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 : http://wso2.com/
> Email: [email protected]
> Mobile: +94711228855
>
>
>


-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 : http://wso2.com/
Email: [email protected]
Mobile: +94711228855

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] ML Model Summary Illustration and Comparison

Reply via email to