[
https://issues.apache.org/jira/browse/MADLIB-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296742#comment-15296742
]
ASF GitHub Bot commented on MADLIB-998:
---------------------------------------
GitHub user iyerr3 opened a pull request:
https://github.com/apache/incubator-madlib/pull/43
SVM: Add class weights for use with unbalanced data
JIRA: MADLIB-998
Added 'class_weight' in the 'params' argument. It can either be a string or
a
dictionary-like mapping. In case of a string, we currently only accept
'balanced' as an option. For a mapping, the user can map values of the
dependent variable to specific double precision weights. Since, SVM only
supports binary classification at present, the class_weight mapping can only
take upto two arguments.
As part of this work, we add a 'tuple_weight' argument to the SVM aggregate.
This allows future addition of sample weights (which would be multiplied
with
the class weight).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/iyerr3/incubator-madlib
feature/svm_class_weights
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/43.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #43
----
commit 545bf14c46d1ae5e75088e83abed1c26a841d4f7
Author: Rahul Iyer <[email protected]>
Date: 2016-05-20T23:28:23Z
SVM: Add class weights for use with unbalanced data
JIRA: MADLIB-998
Added 'class_weight' in the 'params' argument. It can either be a string or
a
dictionary-like mapping. In case of a string, we currently only accept
'balanced' as an option. For a mapping, the user can map values of the
dependent variable to specific double precision weights. Since, SVM only
supports binary classification at present, the class_weight mapping can only
take upto two arguments.
As part of this work, we add a 'tuple_weight' argument to the SVM aggregate.
This allows future addition of sample weights (which would be multiplied
with
the class weight).
----
> Class weights for SVM
> ---------------------
>
> Key: MADLIB-998
> URL: https://issues.apache.org/jira/browse/MADLIB-998
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Support Vector Machines
> Reporter: Rahul Iyer
>
> Add a class weight parameter to add weights to specific dependent variable
> values. This is useful for data with unbalanced classes i.e. situations where
> 1 class has (far) fewer data points compared to other class(es).
> The general format will be similar to that in scikit-learn, described below:
> class_weight: Sets the weight for the positive and negative classes. If not
> given, all classes are set to have weight one.
> If class_weight = balanced, values of y are automatically adjusted as
> inversely proportional to class frequencies in the input data i.e. the
> weights are set as n_samples / (n_classes * bincount ( y )).
> Alternatively, class_weight can be a mapping, giving the weight for each
> class.
> Eg. For dependent variable values 'a' and 'b', the class_weight can be
> {a: 2, b: 3}. This would lead to each 'a' tuple's y value multiplied by 2 and
> each 'b' y value will be multiplied by 3.
> For regression, the class weights are always one.
> 'class_weight' will be part of the optional 'params' argument.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)