[
https://issues.apache.org/jira/browse/MADLIB-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984196#comment-15984196
]
ASF GitHub Bot commented on MADLIB-1095:
----------------------------------------
GitHub user iyerr3 opened a pull request:
https://github.com/apache/incubator-madlib/pull/125
DT: Include rows with NULL features in training
JIRA: MADLIB-1095
This commit enables the capability of decision tree to include rows with
NULL feature values in the training dataset. Features that have NULL
values are not used during the training of respective row,
but the features with non-null values can be used.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/iyerr3/incubator-madlib bugfix/dt_null_rows
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/125.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #125
----
commit 7d41ee5f091c5aa56580095b555a6722b519f009
Author: Rahul Iyer <[email protected]>
Date: 2017-04-26T05:15:35Z
DT: Include rows with NULL features in training
JIRA: MADLIB-1095
This commit enables the capability of decision tree to include rows with
NULL feature values in the training dataset. Features that have NULL
values are not used during the training of respective row,
but the features with non-null values can be used.
----
> Use populated parts of feature vector even if it contains one or more NULL
> entries
> ----------------------------------------------------------------------------------
>
> Key: MADLIB-1095
> URL: https://issues.apache.org/jira/browse/MADLIB-1095
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Decision Tree
> Reporter: Frank McQuillan
> Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently in DT/RF if the feature vector contains any NULLs, the whole row
> will be ignored in the training data. This is not ideal, especially in the
> case where training data is sparse.
> Story
> As a data scientist, I want the DT/RF modules to use the non-NULL parts of
> the feature vector, and not discard the whole row, so that I can get better
> accuracy for classification/regression in the case of sparse data.
> Acceptance
> TBD
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)