[ 
https://issues.apache.org/jira/browse/MADLIB-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296742#comment-15296742
 ] 

ASF GitHub Bot commented on MADLIB-998:
---------------------------------------

GitHub user iyerr3 opened a pull request:

    https://github.com/apache/incubator-madlib/pull/43

    SVM: Add class weights for use with unbalanced data

    JIRA: MADLIB-998
    
    Added 'class_weight' in the 'params' argument. It can either be a string or 
a
    dictionary-like mapping. In case of a string, we currently only accept
    'balanced' as an option. For a mapping, the user can map values of the
    dependent variable to specific double precision weights. Since, SVM only
    supports binary classification at present, the class_weight mapping can only
    take upto two arguments.
    
    As part of this work, we add a 'tuple_weight' argument to the SVM aggregate.
    This allows future addition of sample weights (which would be multiplied 
with
    the class weight).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iyerr3/incubator-madlib 
feature/svm_class_weights

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/43.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #43
    
----
commit 545bf14c46d1ae5e75088e83abed1c26a841d4f7
Author: Rahul Iyer <[email protected]>
Date:   2016-05-20T23:28:23Z

    SVM: Add class weights for use with unbalanced data
    
    JIRA: MADLIB-998
    
    Added 'class_weight' in the 'params' argument. It can either be a string or 
a
    dictionary-like mapping. In case of a string, we currently only accept
    'balanced' as an option. For a mapping, the user can map values of the
    dependent variable to specific double precision weights. Since, SVM only
    supports binary classification at present, the class_weight mapping can only
    take upto two arguments.
    
    As part of this work, we add a 'tuple_weight' argument to the SVM aggregate.
    This allows future addition of sample weights (which would be multiplied 
with
    the class weight).

----


> Class weights for SVM
> ---------------------
>
>                 Key: MADLIB-998
>                 URL: https://issues.apache.org/jira/browse/MADLIB-998
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Support Vector Machines
>            Reporter: Rahul Iyer
>
> Add a class weight parameter to add weights to specific dependent variable 
> values. This is useful for data with unbalanced classes i.e. situations where 
> 1 class has (far) fewer data points compared to other class(es). 
> The general format will be similar to that in scikit-learn, described below: 
> class_weight: Sets the weight for the positive and negative classes. If not 
> given, all classes are set to have weight one.
> If class_weight = balanced, values of y are automatically adjusted as 
> inversely proportional to class frequencies in the input data i.e. the 
> weights are set as n_samples / (n_classes * bincount ( y )).
> Alternatively, class_weight can be a mapping, giving the weight for each 
> class.
> Eg. For dependent variable values 'a' and 'b', the class_weight can be
> {a: 2, b: 3}. This would lead to each 'a' tuple's y value multiplied by 2 and
> each 'b' y value will be multiplied by 3.
> For regression, the class weights are always one.
> 'class_weight' will be part of the optional 'params' argument. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to