GitHub user iyerr3 opened a pull request:

    https://github.com/apache/madlib/pull/296

    DT/RF: Ensure cat features are recorded per group

    JIRA: MADLIB-1254
    
    If tree_train/forest_train is run with grouping enabled and if one of
    the groups has a categorical feature with just single level, then the
    categorical feature is eliminated for that group. If other groups retain
    that feature, then we end up with incorrect "bins" data structure built
    as part of DT.
    
    This commit fixes this issue by recording the categorical features
    present in each group separately.
    
    Closes #295

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib bugfix/rf_grouping_cat_levels

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #296
    
----
commit bf5fa81c264471729ef06ee4af8a27b41f22b45a
Author: Rahul Iyer <riyer@...>
Date:   2018-07-18T00:10:04Z

    DT/RF: Ensure cat features are recorded per group
    
    JIRA: MADLIB-1254
    
    If tree_train/forest_train is run with grouping enabled and if one of
    the groups has a categorical feature with just single level, then the
    categorical feature is eliminated for that group. If other groups retain
    that feature, then we end up with incorrect "bins" data structure built
    as part of DT.
    
    This commit fixes this issue by recording the categorical features
    present in each group separately.
    
    Closes #295

----


---

Reply via email to