Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/259#discussion_r180576675 --- Diff: src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in --- @@ -91,6 +92,22 @@ minibatch_preprocessor( When this value is NULL, no grouping is used and a single preprocessing step is performed for the whole data set. </dd> + + <dt>one_hot_encode_int_dep_var (optional)</dt> + <dd> BOOLEAN. default: FALSE. + A flag to decide whether to one-hot encode dependent variables that are +scalar integers. This parameter is ignored if the dependent variable is not a +scalar integer. + +@note The mini-batch preprocessor automatically encodes +dependent variables that are boolean and character types such as text, char and +varchar. However, scalar integers are a special case because they can be used +in both classification and regression problems, so you must tell the mini-batch +preprocessor whether you want to encode them or not. In the case that you have +already encoded the dependent variable yourself, you can ignore this parameter. +Also, if you want to encode float values for some reason, cast them to text +first. --- End diff -- +1 for the explanation.