jingyimei opened a new pull request #357: Utilities: Add one-hot encode for 
dependent variable in Minibatch DL
URL: https://github.com/apache/madlib/pull/357
 
 
   JIRA:  MADLIB-1303
   This PR adds one-hot encode to minibatch preprocessor DL class. The one-hot 
encode applies to all types: boolean and character types such as text, char and 
varchar, & integers and floats. If the dependent variable is already an array, 
then we assume it is already one-hot encoded and we just cast it to int[] and 
pass it along.
   
   This PR also removes the param `dependent_offset (optional)` from the 
current interface since one-hot encoding is the more general solution.
   
   Besides, a column named `class_values` is added to output summary table to 
reflect the one-hot encoding categories.
   
   Co-authored-by: Ekta Khanna <[email protected]>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to