njayaram2 opened a new pull request #361: Minibatch Preprocessor DL: Add optional num_classes param. URL: https://github.com/apache/madlib/pull/361 The current `minibatch_preprocessor_dl()` module looks at the input table to find the number of distinct categories (class values) for the dependent variable, and uses that number as the size of the one-hot-encoded array. This could lead to a failure in madlib_keras fit function if the `num_classes` defined in the architecture is a number greater/different than the size of the one hot encoded array. This commit adds two functionalities: 1) A new optional parameter to `minibatch_preprocessor_dl()` that will be used to determine the length of the 1-hot encoded vector for the dependent var. If the param is set to NULL, the length will be equal to the number of distinct class values found in the dataset, else num_classes must be greater than equal to the number of distinct class values. The `class_values` column in the summary table contains an array of class values associated with the 1-hot encoded vector. That will have NULL as the value for class values that we don't find any representation for in the dataset. 2) We now support NULL as a valid class value for dependent variable.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
