Yes it seems. I've removed the code of WSSSE calculation since it's not
used currently.

On Tue, Jul 14, 2015 at 11:56 AM, Maheshakya Wijewardena <
[email protected]> wrote:

> But again, WSSSE can (only) be useful when comparing the models of
> different k values in k-means algorithm.
>
> On Tue, Jul 14, 2015 at 11:46 AM, Maheshakya Wijewardena <
> [email protected]> wrote:
>
>> I'm not sure whether within-cluster sum of squared error would be a good
>> metric for k-means since what k-means does in its' optimization is
>> minimizing that error[1]. Therefore, the result we get will always be good
>> according to that measure. I think an internal validation method[2] that
>> does not depend on the same optimization technique of k-means would be more
>> suitable.
>>
>> [1] https://en.wikipedia.org/wiki/K-means_clustering#Description
>> [2] http://www.universitypress.org.uk/journals/cc/20-463.pdf
>>
>> On Mon, Jul 13, 2015 at 12:04 PM, Nirmal Fernando <[email protected]>
>> wrote:
>>
>>> https://spark.apache.org/docs/latest/mllib-clustering.html#k-means
>>>
>>> On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]>
>>> wrote:
>>>
>>>> Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure
>>>> of clustering?
>>>>
>>>>
>>>> On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We have implemented model comparison for classification and numerical
>>>>> prediction with following measures.
>>>>>
>>>>>    - Binary and multiclass classification - Accuracy
>>>>>    - Numerical prediction - Mean squared error
>>>>>
>>>>> We are currently working on a sorted view of models according to their
>>>>> accuracy/MSE. This release will not support cross comparison for 
>>>>> clustering
>>>>> algorithms.
>>>>>
>>>>> Thanks,
>>>>> CD
>>>>>
>>>>> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> With what chart types and implementations we are going to proceed for
>>>>>> alpha? We will be able to finalize comparison and summery views with 
>>>>>> them.
>>>>>>
>>>>>> Thanks,
>>>>>> CD
>>>>>>
>>>>>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Nirmal,
>>>>>>>
>>>>>>> During the last discussion, what we decided was to, show some
>>>>>>> numerical value (Accuracy / Std error) next to each model to illustrate 
>>>>>>> the
>>>>>>> performance in the model listing view, so that user can get an overall 
>>>>>>> idea
>>>>>>> at one glance. And in a separate page, have the ROC comparison. Think we
>>>>>>> still need to figure out where would the later fit in, in the UI
>>>>>>> navigation..
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Supun
>>>>>>>
>>>>>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for summarizing Supun. Did we think about how we gonna
>>>>>>>> create the cross-model comparisons view?
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> [-strategy@, +architecture@]
>>>>>>>>>
>>>>>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> should go to arch@
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected]
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Supun!! this looks good.
>>>>>>>>>>>
>>>>>>>>>>> --Srinath
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected]
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> Following is the break down of the Model Summary illustrations
>>>>>>>>>>>> that can be supported by ML at the moment. Initiating this thread 
>>>>>>>>>>>> to
>>>>>>>>>>>> finalize on what we can support and what cannot, with the initial 
>>>>>>>>>>>> release.
>>>>>>>>>>>> Blue colored ones are yet to implement.
>>>>>>>>>>>>
>>>>>>>>>>>>    - Numerical Prediction
>>>>>>>>>>>>       - Standard Error [1]
>>>>>>>>>>>>       - Residual Plot [2]
>>>>>>>>>>>>       - Feature Importance (*Graph containing weights assigned
>>>>>>>>>>>>       to each of the feature in the model*)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Classification:
>>>>>>>>>>>>    - Binary
>>>>>>>>>>>>       - ROC [3]
>>>>>>>>>>>>          - AUC
>>>>>>>>>>>>          - Confusion Matrix (*Available on spark as a
>>>>>>>>>>>>          static metric. But if this was calculated manually, it 
>>>>>>>>>>>> can be made
>>>>>>>>>>>>          interactive, so that user can find the optimal threshold*
>>>>>>>>>>>>          )
>>>>>>>>>>>>          - Accuracy
>>>>>>>>>>>>          - Feature Importance
>>>>>>>>>>>>       - Multi-Class
>>>>>>>>>>>>          - Confusion Matrix (*Available on spark*)
>>>>>>>>>>>>          - Accuracy
>>>>>>>>>>>>          - Feature Importance
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - Clustering
>>>>>>>>>>>>       - Scatter plot with clustered points
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Cross-comparing Models*
>>>>>>>>>>>>
>>>>>>>>>>>> As you can see, major limitation we have when cross comparing
>>>>>>>>>>>> models within a project is, different categories have different 
>>>>>>>>>>>> summary
>>>>>>>>>>>> statistics/plots, and hence we cannot compare two models in two 
>>>>>>>>>>>> categories.
>>>>>>>>>>>>
>>>>>>>>>>>> Following are the possibilities:
>>>>>>>>>>>>
>>>>>>>>>>>>    - ROC can be used to compare Binary classification models.
>>>>>>>>>>>>    - Cobweb (a radar chart) can be used to compare Multi-Class
>>>>>>>>>>>>    classification models (This is the possible alternative for
>>>>>>>>>>>>    ROC in multi-class case. But the drawback is, the graph will be 
>>>>>>>>>>>> very
>>>>>>>>>>>>    unclear when there are excess amounts of features in the 
>>>>>>>>>>>> models). [4] [5]
>>>>>>>>>>>>    - Accuracy can be used to compare all classification models.
>>>>>>>>>>>>
>>>>>>>>>>>> Please add if I've missed anything.
>>>>>>>>>>>>
>>>>>>>>>>>> *Ref:*
>>>>>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html
>>>>>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx
>>>>>>>>>>>> [3]
>>>>>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X
>>>>>>>>>>>> [4]
>>>>>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data
>>>>>>>>>>>> [5]
>>>>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Supun
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> ============================
>>>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>>>> Phone: 0772360902
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ============================
>>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>>> Phone: 0772360902
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Supun Sethunga*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.
>>>>>>>>> http://wso2.com/
>>>>>>>>> lean | enterprise | middleware
>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Thanks & regards,
>>>>>>>> Nirmal
>>>>>>>>
>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>> Mobile: +94715779733
>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Supun Sethunga*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> http://wso2.com/
>>>>>>> lean | enterprise | middleware
>>>>>>> Mobile : +94 716546324
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *CD Athuraliya*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> lean . enterprise . middleware
>>>>>> Mobile: +94 716288847 <94716288847>
>>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *CD Athuraliya*
>>>>> Software Engineer
>>>>> WSO2, Inc.
>>>>> lean . enterprise . middleware
>>>>> Mobile: +94 716288847 <94716288847>
>>>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>>>> <https://twitter.com/cdathuraliya> | Blog
>>>>> <http://cdathuraliya.tumblr.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks & regards,
>>>> Nirmal
>>>>
>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>> Mobile: +94715779733
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Pruthuvi Maheshakya Wijewardena
>> Software Engineer
>> WSO2 : http://wso2.com/
>> Email: [email protected]
>> Mobile: +94711228855
>>
>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> Software Engineer
> WSO2 : http://wso2.com/
> Email: [email protected]
> Mobile: +94711228855
>
>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to