Frank McQuillan created MADLIB-1095:
---------------------------------------

             Summary: Use populated parts of feature vector even if it contains 
one or more NULL entries
                 Key: MADLIB-1095
                 URL: https://issues.apache.org/jira/browse/MADLIB-1095
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Module: Decision Tree
            Reporter: Frank McQuillan
             Fix For: v1.12


Context 

Currently in DT/RF if the feature vector contains any NULLs, the whole row will 
be ignored in the training data.  This is not ideal, especially in the case 
where training data is sparse.

Story

As a data scientist, I want the DT/RF modules to use the non-NULL parts of the 
feature vector, and not discard the whole row, so that I can get better 
accuracy for classification/regression in the case of sparse data.

Acceptance

TBD



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to