fmcquillan99 commented on a change in pull request #445: updated DL 
preprocessor docs for bytea
URL: https://github.com/apache/madlib/pull/445#discussion_r330211862
 
 

 ##########
 File path: 
src/ports/postgres/modules/deep_learning/input_data_preprocessor.sql_in
 ##########
 @@ -209,25 +214,39 @@ validation_preprocessor_dl(source_table,
     validation_preprocessor_dl() contain the following columns:
     <table class="output">
       <tr>
-        <th>buffer_id</th>
-        <td>INTEGER. Unique id for each row in the packed table.
+        <th>independent_var</th>
+        <td>BYTEA. Packed array of independent variables in PostgreSQL bytea 
format.
         </td>
       </tr>
       <tr>
         <th>dependent_var</th>
-        <td>ANYARRAY[]. Packed array of dependent variables.
+        <td>BYTEA. Packed array of dependent variables in PostgreSQL bytea 
format.
         The dependent variable is always one-hot encoded as an
-        INTEGER[] array. For now, we are assuming that
+        integer array. For now, we are assuming that
         input_preprocessor_dl() will be used
         only for classification problems using deep learning. So
         the dependent variable is one-hot encoded, unless it's already a
         numeric array in which case we assume it's already one-hot
-        encoded and just cast it to an INTEGER[] array.
+        encoded and just cast it to an integer array.
         </td>
       </tr>
       <tr>
-        <th>independent_var</th>
-        <td>REAL[]. Packed array of independent variables.
+        <th>independent_var_shape</th>
+        <td>INTEGER[]. Shape of the independent variable array after 
preprocessing.
+        The first element is the number of images packed per row, and 
subsequent
+        elements will depend on how the image is described (e.g., channels 
first or last).
+        </td>
+      </tr>
+      <tr>
+        <th>dependent_var_shape</th>
+        <td>INTEGER[]. Shape of the dependent variable array after 
preprocessing.
+        The first element is the number of images packed per row, and the 
second
+        element is the number of class values.
 
 Review comment:
   We do talk about 1-hot encoding in some detail when talking about 
`dependent_varname` in the function definition, so I am hoping that is 
sufficient.  If not, please let me know.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to