[jira] [Commented] (MADLIB-1095) Use populated parts of feature vector even if it contains one or more NULL entries

ASF GitHub Bot (JIRA) Tue, 25 Apr 2017 22:23:38 -0700

    [ 
https://issues.apache.org/jira/browse/MADLIB-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984196#comment-15984196
 ]


ASF GitHub Bot commented on MADLIB-1095:
----------------------------------------

GitHub user iyerr3 opened a pull request:

    https://github.com/apache/incubator-madlib/pull/125

    DT: Include rows with NULL features in training

    JIRA: MADLIB-1095
    
    This commit enables the capability of decision tree to include rows with
    NULL feature values in the training dataset. Features that have NULL
    values are not used during the training of respective row,
    but the features with non-null values can be used.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iyerr3/incubator-madlib bugfix/dt_null_rows

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #125
    
----
commit 7d41ee5f091c5aa56580095b555a6722b519f009
Author: Rahul Iyer <[email protected]>
Date:   2017-04-26T05:15:35Z

    DT: Include rows with NULL features in training
    
    JIRA: MADLIB-1095
    
    This commit enables the capability of decision tree to include rows with
    NULL feature values in the training dataset. Features that have NULL
    values are not used during the training of respective row,
    but the features with non-null values can be used.

----


> Use populated parts of feature vector even if it contains one or more NULL 
> entries
> ----------------------------------------------------------------------------------
>
>                 Key: MADLIB-1095
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1095
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Decision Tree
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.11
>
>
> Context 
> Currently in DT/RF if the feature vector contains any NULLs, the whole row 
> will be ignored in the training data.  This is not ideal, especially in the 
> case where training data is sparse.
> Story
> As a data scientist, I want the DT/RF modules to use the non-NULL parts of 
> the feature vector, and not discard the whole row, so that I can get better 
> accuracy for classification/regression in the case of sparse data.
> Acceptance
> TBD



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (MADLIB-1095) Use populated parts of feature vector even if it contains one or more NULL entries

Reply via email to