https://spark.apache.org/docs/latest/mllib-clustering.html#k-means

On Mon, Jul 13, 2015 at 12:03 PM, Nirmal Fernando <[email protected]> wrote:

> Why can't we use Within Set Sum of Squared Error (WSSSE) as a measure of
> clustering?
>
>
> On Fri, May 15, 2015 at 4:34 PM, CD Athuraliya <[email protected]>
> wrote:
>
>> Hi all,
>>
>> We have implemented model comparison for classification and numerical
>> prediction with following measures.
>>
>>    - Binary and multiclass classification - Accuracy
>>    - Numerical prediction - Mean squared error
>>
>> We are currently working on a sorted view of models according to their
>> accuracy/MSE. This release will not support cross comparison for clustering
>> algorithms.
>>
>> Thanks,
>> CD
>>
>> On Tue, May 5, 2015 at 5:41 PM, CD Athuraliya <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> With what chart types and implementations we are going to proceed for
>>> alpha? We will be able to finalize comparison and summery views with them.
>>>
>>> Thanks,
>>> CD
>>>
>>> On Fri, May 1, 2015 at 9:39 AM, Supun Sethunga <[email protected]> wrote:
>>>
>>>> Hi Nirmal,
>>>>
>>>> During the last discussion, what we decided was to, show some numerical
>>>> value (Accuracy / Std error) next to each model to illustrate the
>>>> performance in the model listing view, so that user can get an overall idea
>>>> at one glance. And in a separate page, have the ROC comparison. Think we
>>>> still need to figure out where would the later fit in, in the UI
>>>> navigation..
>>>>
>>>> Thanks,
>>>> Supun
>>>>
>>>> On Thu, Apr 30, 2015 at 6:51 PM, Nirmal Fernando <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for summarizing Supun. Did we think about how we gonna create
>>>>> the cross-model comparisons view?
>>>>>
>>>>> On Thu, Apr 30, 2015 at 8:33 AM, Supun Sethunga <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> [-strategy@, +architecture@]
>>>>>>
>>>>>> On Thu, Apr 30, 2015 at 5:58 PM, Srinath Perera <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> should go to arch@
>>>>>>>
>>>>>>> On Thu, Apr 30, 2015 at 6:28 AM, Srinath Perera <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Supun!! this looks good.
>>>>>>>>
>>>>>>>> --Srinath
>>>>>>>>
>>>>>>>> On Thu, Apr 30, 2015 at 6:25 AM, Supun Sethunga <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Following is the break down of the Model Summary illustrations
>>>>>>>>> that can be supported by ML at the moment. Initiating this thread to
>>>>>>>>> finalize on what we can support and what cannot, with the initial 
>>>>>>>>> release.
>>>>>>>>> Blue colored ones are yet to implement.
>>>>>>>>>
>>>>>>>>>    - Numerical Prediction
>>>>>>>>>       - Standard Error [1]
>>>>>>>>>       - Residual Plot [2]
>>>>>>>>>       - Feature Importance (*Graph containing weights assigned to
>>>>>>>>>       each of the feature in the model*)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Classification:
>>>>>>>>>    - Binary
>>>>>>>>>       - ROC [3]
>>>>>>>>>          - AUC
>>>>>>>>>          - Confusion Matrix (*Available on spark as a
>>>>>>>>>          static metric. But if this was calculated manually, it can 
>>>>>>>>> be made
>>>>>>>>>          interactive, so that user can find the optimal threshold*
>>>>>>>>>          )
>>>>>>>>>          - Accuracy
>>>>>>>>>          - Feature Importance
>>>>>>>>>       - Multi-Class
>>>>>>>>>          - Confusion Matrix (*Available on spark*)
>>>>>>>>>          - Accuracy
>>>>>>>>>          - Feature Importance
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Clustering
>>>>>>>>>       - Scatter plot with clustered points
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Cross-comparing Models*
>>>>>>>>>
>>>>>>>>> As you can see, major limitation we have when cross comparing
>>>>>>>>> models within a project is, different categories have different 
>>>>>>>>> summary
>>>>>>>>> statistics/plots, and hence we cannot compare two models in two 
>>>>>>>>> categories.
>>>>>>>>>
>>>>>>>>> Following are the possibilities:
>>>>>>>>>
>>>>>>>>>    - ROC can be used to compare Binary classification models.
>>>>>>>>>    - Cobweb (a radar chart) can be used to compare Multi-Class
>>>>>>>>>    classification models (This is the possible alternative for
>>>>>>>>>    ROC in multi-class case. But the drawback is, the graph will be 
>>>>>>>>> very
>>>>>>>>>    unclear when there are excess amounts of features in the models). 
>>>>>>>>> [4] [5]
>>>>>>>>>    - Accuracy can be used to compare all classification models.
>>>>>>>>>
>>>>>>>>> Please add if I've missed anything.
>>>>>>>>>
>>>>>>>>> *Ref:*
>>>>>>>>> [1] http://onlinestatbook.com/2/regression/accuracy.html
>>>>>>>>> [2] http://stattrek.com/regression/residual-analysis.aspx
>>>>>>>>> [3]
>>>>>>>>> http://www.sciencedirect.com/science/article/pii/S016786550500303X
>>>>>>>>> [4]
>>>>>>>>> http://www.academia.edu/2519022/Visualization_and_analysis_of_classifiers_performance_in_multi-class_medical_data
>>>>>>>>> [5]
>>>>>>>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.8450&rep=rep1&type=pdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Supun
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Supun Sethunga*
>>>>>>>>> Software Engineer
>>>>>>>>> WSO2, Inc.
>>>>>>>>> http://wso2.com/
>>>>>>>>> lean | enterprise | middleware
>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ============================
>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>> Phone: 0772360902
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ============================
>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>> Phone: 0772360902
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>
>>>
>>>
>>> --
>>> *CD Athuraliya*
>>> Software Engineer
>>> WSO2, Inc.
>>> lean . enterprise . middleware
>>> Mobile: +94 716288847 <94716288847>
>>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>>> <https://twitter.com/cdathuraliya> | Blog
>>> <http://cdathuraliya.tumblr.com/>
>>>
>>
>>
>>
>> --
>> *CD Athuraliya*
>> Software Engineer
>> WSO2, Inc.
>> lean . enterprise . middleware
>> Mobile: +94 716288847 <94716288847>
>> LinkedIn <http://lk.linkedin.com/in/cdathuraliya> | Twitter
>> <https://twitter.com/cdathuraliya> | Blog
>> <http://cdathuraliya.tumblr.com/>
>>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to