Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/43#discussion_r64303826
  
    --- Diff: src/ports/postgres/modules/svm/svm.py_in ---
    @@ -781,6 +785,46 @@ def _random_feature_map(schema_madlib, source_table, 
dependent_varname,
                            dependent_varname, grouping_col))
     
     
    +def _compute_class_weight_sql(source_table, dependent_varname,
    +                              is_svc, class_weight_str):
    +    """
    +    Args:
    +        @param is_svc: Boolean, indicates if classification or regression
    +
    +    Returns:
    +        str. String when executed in SQL computes the class weight for 
each tuple
    +    """
    +    if not is_svc or not class_weight_str:
    +        return "1"
    +
    +    dep_to_weight = defaultdict(float)
    +    if class_weight_str == "balanced":
    +        # use half of n_samples since only doing binary classification
    +        # Change the '2' to n_classes for multinomial
    +        n_samples_per_class = num_samples(source_table) / 2
    +        bin_count = plpy.execute("""SELECT {dep} as k, count(*) as v
    +                                    FROM {src}
    +                                    GROUP BY {dep}
    +                                 """.format(dep=dependent_varname,
    +                                            src=source_table))
    +        for each_count in bin_count:
    +            plpy.info(each_count)
    --- End diff --
    
    remove the 'info' line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to