Github user njayaram2 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/259#discussion_r180576675
--- Diff:
src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in ---
@@ -91,6 +92,22 @@ minibatch_preprocessor(
When this value is NULL, no grouping is used and a single preprocessing
step
is performed for the whole data set.
</dd>
+
+ <dt>one_hot_encode_int_dep_var (optional)</dt>
+ <dd> BOOLEAN. default: FALSE.
+ A flag to decide whether to one-hot encode dependent variables that are
+scalar integers. This parameter is ignored if the dependent variable is
not a
+scalar integer.
+
+@note The mini-batch preprocessor automatically encodes
+dependent variables that are boolean and character types such as text,
char and
+varchar. However, scalar integers are a special case because they can be
used
+in both classification and regression problems, so you must tell the
mini-batch
+preprocessor whether you want to encode them or not. In the case that you
have
+already encoded the dependent variable yourself, you can ignore this
parameter.
+Also, if you want to encode float values for some reason, cast them to text
+first.
--- End diff --
+1 for the explanation.
---