I'm not sure whether within-cluster sum of squared error would be a good
metric for k-means since what k-means does in its' optimization is
minimizing that error[1]. Therefore, the result we get will always be good
according to that measure. I think an internal validation method[2] that
does not depend on the same optimization technique of k-means would be more
suitable.

[1] https://en.wikipedia.org/wiki/K-means_clustering#Description
[2] http://www.universitypress.org.uk/journals/cc/20-463.pdf

On Mon, Jul 13, 2015 at 12:04 PM, Nirmal Fernando <[email protected]> wrote:

> https://spark.apache.org/docs/latest/mllib-clustering.html#k-means
>
> On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]> wrote:
>
>> Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure of
>> clustering?
>>
>>
>> On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> We have implemented model comparison for classification and numerical
>>> prediction with following measures.
>>>
>>>    - Binary and multiclass classification - Accuracy
>>>    - Numerical prediction - Mean squared error
>>>
>>> We are currently working on a sorted view of models according to their
>>> accuracy/MSE. This release will not support cross comparison for clustering
>>> algorithms.
>>>
>>> Thanks,
>>> CD
>>>
>>> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> With what chart types and implementations we are going to proceed for
>>>> alpha? We will be able to finalize comparison and summery views with them.
>>>>
>>>> Thanks,
>>>> CD
>>>>
>>>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]> wrote:
>>>>
>>>>> Hi Nirmal,
>>>>>
>>>>> During the last discussion, what we decided was to, show some
>>>>> numerical value (Accuracy / Std error) next to each model to illustrate 
>>>>> the
>>>>> performance in the model listing view, so that user can get an overall 
>>>>> idea
>>>>> at one glance. And in a separate page, have the ROC comparison. Think we
>>>>> still need to figure out where would the later fit in, in the UI
>>>>> navigation..
>>>>>
>>>>> Thanks,
>>>>> Supun
>>>>>
>>>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks for summarizing Supun. Did we think about how we gonna create
>>>>>> the cross-model comparisons view?
>>>>>>
>>>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> [-strategy@, +architecture@]
>>>>>>>
>>>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> should go to arch@
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Supun!! this looks good.
>>>>>>>>>
>>>>>>>>> --Srinath
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Following is the break down of the Model Summary illustrations
>>>>>>>>>> that can be supported by ML at the moment. Initiating this thread to
>>>>>>>>>> finalize on what we can support and what cannot, with the initial 
>>>>>>>>>> release.
>>>>>>>>>> Blue colored ones are yet to implement.
>>>>>>>>>>
>>>>>>>>>>    - Numerical Prediction
>>>>>>>>>>       - Standard Error [1]
>>>>>>>>>>       - Residual Plot [2]
>>>>>>>>>>       - Feature Importance (*Graph containing weights assigned
>>>>>>>>>>       to each of the feature in the model*)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Classification:
>>>>>>>>>>    - Binary
>>>>>>>>>>       - ROC [3]
>>>>>>>>>>          - AUC
>>>>>>>>>>          - Confusion Matrix (*Available on spark as a
>>>>>>>>>>          static metric. But if this was calculated manually, it can 
>>>>>>>>>> be made
>>>>>>>>>>          interactive, so that user can find the optimal threshold*
>>>>>>>>>>          )
>>>>>>>>>>          - Accuracy
>>>>>>>>>>          - Feature Importance
>>>>>>>>>>       - Multi-Class
>>>>>>>>>>          - Confusion Matrix (*Available on spark*)
>>>>>>>>>>          - Accuracy
>>>>>>>>>>          - Feature Importance
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Clustering
>>>>>>>>>>       - Scatter plot with clustered points
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Cross-comparing Models*
>>>>>>>>>>
>>>>>>>>>> As you can see, major limitation we have when cross comparing
>>>>>>>>>> models within a project is, different categories have different 
>>>>>>>>>> summary
>>>>>>>>>> statistics/plots, and hence we cannot compare two models in two 
>>>>>>>>>> categories.
>>>>>>>>>>
>>>>>>>>>> Following are the possibilities:
>>>>>>>>>>
>>>>>>>>>>    - ROC can be used to compare Binary classification models.
>>>>>>>>>>    - Cobweb (a radar chart) can be used to compare Multi-Class
>>>>>>>>>>    classification models (This is the possible alternative for
>>>>>>>>>>    ROC in multi-class case. But the drawback is, the graph will be 
>>>>>>>>>> very
>>>>>>>>>>    unclear when there are excess amounts of features in the models). 
>>>>>>>>>> [4] [5]
>>>>>>>>>>    - Accuracy can be used to compare all classification models.
>>>>>>>>>>
>>>>>>>>>> Please add if I've missed anything.
>>>>>>>>>>
>>>>>>>>>> *Ref:*
>>>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html
>>>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx
>>>>>>>>>> [3]
>>>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X
>>>>>>>>>> [4]
>>>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data
>>>>>>>>>> [5]
>>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Supun
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>> Software Engineer
>>>>>>>>>> WSO2, Inc.
>>>>>>>>>> http://wso2.com/
>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ============================
>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>> Phone: 0772360902
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ============================
>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>> Phone: 0772360902
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Supun Sethunga*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> http://wso2.com/
>>>>>>> lean | enterprise | middleware
>>>>>>> Mobile : +94 716546324
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks & regards,
>>>>>> Nirmal
>>>>>>
>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>> Mobile: +94715779733
>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Supun Sethunga*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> http://wso2.com/
>>>>> lean | enterprise | middleware
>>>>> Mobile : +94 716546324
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *CD Athuraliya*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> lean . enterprise . middleware
>>>> Mobile: +94 716288847 <94716288847>
>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>> <https://twitter.com/cdathuraliya> | Blog
>>>> <http://cdathuraliya.tumblr.com/>
>>>>
>>>
>>>
>>>
>>> --
>>> *CD Athuraliya*
>>> Software Engineer
>>> WSO2, Inc.
>>> lean . enterprise . middleware
>>> Mobile: +94 716288847 <94716288847>
>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>> <https://twitter.com/cdathuraliya> | Blog
>>> <http://cdathuraliya.tumblr.com/>
>>>
>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 : http://wso2.com/
Email: [email protected]
Mobile: +94711228855
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to