[
https://issues.apache.org/jira/browse/MADLIB-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985188#comment-15985188
]
ASF GitHub Bot commented on MADLIB-1092:
----------------------------------------
GitHub user njayaram2 opened a pull request:
https://github.com/apache/incubator-madlib/pull/126
Bugfix: Elastic net gives inconsistent result
JIRA: MADLIB-1092
- Elastic net used to consider the number of rows as the total number
of rows in the table even when grouping was used. This fix changes
that to consider the number of rows in a group while computing IGD.
- Elastic net used to consider mean and standard deviation for both
independent and dependent variables based on the entire table even
when grouping was used. This is now computed based on a group,
which is used to computed the scaled data when standardize=TRUE
for Gaussian IGD.
- One approximation still remains. During gradient computation (C++),
every value in the independent variable (for each dimension) is
subtracted with the mean computed based on the entire table and
not groups. This approximiation was adopted since it is messy to
pass group specific mean values for every row in the table to the
C++ layer.
@iyerr3
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/njayaram2/incubator-madlib
bugfix/elastic_net_grouping
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/126.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #126
----
commit 92bbd3d08d457c5c7096aadf1403fc5e9df6ed7a
Author: Nandish Jayaram <[email protected]>
Date: 2017-04-24T16:46:03Z
Bugfix: Elastic net gives inconsistent result
JIRA: MADLIB-1092
- Elastic net used to consider the number of rows as the total number
of rows in the table even when grouping was used. This fix changes
that to consider the number of rows in a group while computing IGD.
- Elastic net used to consider mean and standard deviation for both
independent and dependent variables based on the entire table even
when grouping was used. This is now computed based on a group,
which is used to computed the scaled data when standardize=TRUE
for Gaussian IGD.
- One approximation still remains. During gradient computation (C++),
every value in the independent variable (for each dimension) is
subtracted with the mean computed based on the entire table and
not groups. This approximiation was adopted since it is messy to
pass group specific mean values for every row in the table to the
C++ layer.
----
> Elastic Net gives inconsistent results with grouping
> ----------------------------------------------------
>
> Key: MADLIB-1092
> URL: https://issues.apache.org/jira/browse/MADLIB-1092
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Regularized Regression
> Reporter: Nandish Jayaram
> Fix For: v1.11
>
>
> Elastic net train seems to be giving incorrect results when used with
> grouping.
> Steps:
> - Run elastic net (train) on a table and obtain a model (M1).
> - Create a new table with all rows in the original input table and assign
> group value 1 for it.
> - Replicate the rows in the table and assign group value 2 for the replicated
> rows.
> - Run the elastic net train function with grouping while keeping the same
> optimization parameters for the function.
> Result:
> - The model (for each group) when run with grouping is different from the
> model M1.
> - The model for both the groups is the same, but not same as M1.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)