Rahul Iyer created MADLIB-998:
---------------------------------
Summary: Class weights for SVM
Key: MADLIB-998
URL: https://issues.apache.org/jira/browse/MADLIB-998
Project: Apache MADlib
Issue Type: New Feature
Components: Module: Support Vector Machines
Reporter: Rahul Iyer
Add a class weight parameter to add weights to specific dependent variable
values. This is useful for data with unbalanced classes i.e. situations where 1
class has (far) fewer data points compared to other class(es).
The general format will be similar to that in scikit-learn, described below:
class_weight: Sets the weight for the positive and negative classes. If not
given, all classes are set to have weight one.
If class_weight = balanced, values of y are automatically adjusted as inversely
proportional to class frequencies in the input data i.e. the weights are set as
n_samples / (n_classes * bincount(y)).
Alternatively, class_weight can be a mapping, giving the weight for each class.
Eg. For dependent variable values 'a' and 'b', the class_weight can be
{a: 2, b: 3}. This would lead to each 'a' tuple's y value multiplied by 2 and
each 'b' y value will be multiplied by 3.
For regression, the class weights are always one.
'class_weight' will be part of the optional 'params' argument.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)