njayaram2 commented on a change in pull request #361: Minibatch Preprocessor 
DL: Add optional num_classes param.
URL: https://github.com/apache/madlib/pull/361#discussion_r271494568
 
 

 ##########
 File path: src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
 ##########
 @@ -363,21 +365,70 @@ class MiniBatchPreProcessorDL(MiniBatchPreProcessor):
 
         self._validate_args()
         self.num_of_buffers = self._get_num_buffers()
-        self.to_one_hot_encode = True
 
 Review comment:
   Our 1-hot encoding follows the standard one-hot encoding convention. In 
fact, it is different from `keras.to_categorical`. For example, if there are 3 
distinct class values captured in a list `y=[10, 11, 12]`, then the 1-hot 
encoded vector created by`keras.to_categorical(y)` is of size 13 (largest class 
value + 1). If it is called with `keras.to_categorical(y, num_classes=4)`, it 
errors out.
   The 1-hot encoding done in MADlib would create a 1-hot encoded vector of 
size 4 in both cases.
   
   I would say keras' 1-hot encoding is actually not the standard way of doing 
it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to