Frank McQuillan created MADLIB-1095:
---------------------------------------
Summary: Use populated parts of feature vector even if it contains
one or more NULL entries
Key: MADLIB-1095
URL: https://issues.apache.org/jira/browse/MADLIB-1095
Project: Apache MADlib
Issue Type: Improvement
Components: Module: Decision Tree
Reporter: Frank McQuillan
Fix For: v1.12
Context
Currently in DT/RF if the feature vector contains any NULLs, the whole row will
be ignored in the training data. This is not ideal, especially in the case
where training data is sparse.
Story
As a data scientist, I want the DT/RF modules to use the non-NULL parts of the
feature vector, and not discard the whole row, so that I can get better
accuracy for classification/regression in the case of sparse data.
Acceptance
TBD
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)