[GitHub] madlib pull request #259: Minibatch: Add one-hot encoding option for int

njayaram2 Tue, 10 Apr 2018 15:30:31 -0700

Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/259#discussion_r180576675
  
    --- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in ---
    @@ -91,6 +92,22 @@ minibatch_preprocessor(
        When this value is NULL, no grouping is used and a single preprocessing 
step
        is performed for the whole data set.
       </dd>
    +
    +  <dt>one_hot_encode_int_dep_var (optional)</dt>
    +  <dd> BOOLEAN. default: FALSE.
    +  A flag to decide whether to one-hot encode dependent variables that are
    +scalar integers. This parameter is ignored if the dependent variable is 
not a
    +scalar integer.
    +
    +@note The mini-batch preprocessor automatically encodes
    +dependent variables that are boolean and character types such as text, 
char and
    +varchar.  However, scalar integers are a special case because they can be 
used
    +in both classification and regression problems, so you must tell the 
mini-batch
    +preprocessor whether you want to encode them or not. In the case that you 
have
    +already encoded the dependent variable yourself,  you can ignore this 
parameter.
    +Also, if you want to encode float values for some reason, cast them to text
    +first.
    --- End diff --
    
    +1 for the explanation.

---

[GitHub] madlib pull request #259: Minibatch: Add one-hot encoding option for int

Reply via email to