jingyimei opened a new pull request #357: Utilities: Add one-hot encode for dependent variable in Minibatch DL URL: https://github.com/apache/madlib/pull/357 JIRA: MADLIB-1303 This PR adds one-hot encode to minibatch preprocessor DL class. The one-hot encode applies to all types: boolean and character types such as text, char and varchar, & integers and floats. If the dependent variable is already an array, then we assume it is already one-hot encoded and we just cast it to int[] and pass it along. This PR also removes the param `dependent_offset (optional)` from the current interface since one-hot encoding is the more general solution. Besides, a column named `class_values` is added to output summary table to reflect the one-hot encoding categories. Co-authored-by: Ekta Khanna <[email protected]>
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
