kaknikhil commented on pull request #564:
URL: https://github.com/apache/madlib/pull/564#issuecomment-829732029


   Consolidating all the testing comments in one place 
   
   1. Calling non minibatched data with adam or rmsprop does not fail 
gracefully. We should throw a helpful error message
   ```
   SELECT madlib.mlp_classification( 'iris_data', 'mlp_model', 
'attributes','class_text',ARRAY[5], 'learning_rate_init=1.5, 
learning_rate_policy=constant, n_iterations=50, tolerance=0, rho=0.9, 
solver=rmsprop','tanh', NULL, FALSE, TRUE);
   
   ERROR:  spiexceptions.UndefinedColumn: column "attributes" does not exist
   LINE 8:                             (attributes)::DOUBLE PRECISION[]...
                                        ^
               SELECT
                   array_to_string(ARRAY[Null], ',') AS 
__madlib_temp_col_grp_key91737014_1619741694_53757093__,
                   NULL,
                   1 AS 
__madlib_temp_col_grp_iteration76439383_1619741694_35069372__,
                   (
                           madlib.mlp_alr_step(
                               (attributes)::DOUBLE PRECISION[],
                               
(ARRAY[(__madlib_temp_dep_var_norm96411952_1619741693_22150079__) = 
'Iris-setosa', (__madlib_temp_dep_var_norm96411952_1619741693_22150079__) = 
'Iris-versicolor', (__madlib_temp_dep_var_norm96411952_1619741693_22150079__) = 
'Iris-virginica']::INTEGER[])::DOUBLE PRECISION[],
                               
__madlib_temp_rel_state18952145_1619741694_43645987__.__madlib_temp_col_grp_state22584915_1619741694_5254854__,
                               ARRAY[ 4,40,3 ]::DOUBLE PRECISION[],
                               (1.5)::FLOAT8,
                               2,
                               1,
                               (1)::DOUBLE PRECISION,
                               (NULL::DOUBLE PRECISION[])::DOUBLE PRECISION[],
                               0,
                               1::integer,
                               1::integer,
                               1::integer,
                               0.9::FLOAT8,
                               0.9::FLOAT8,
                               0.999::FLOAT8,
                               1e-07::FLOAT8
                           )
                           ) AS 
__madlib_temp_col_grp_state22584915_1619741694_5254854__
               FROM (
                   SELECT *,
                       array_to_string(ARRAY[Null], ',') AS 
__madlib_temp_col_grp_key91737014_1619741694_53757093__
                   FROM 
__madlib_temp_tbl_data_scaled39150522_1619741693_1119413__
               ) AS _src
               JOIN ( SELECT grp_key AS 
__madlib_temp_col_grp_key91737014_1619741694_53757093__,state AS 
__madlib_temp_col_grp_state22584915_1619741694_5254854__ FROM 
madlib._gen_state($1, NULL, $2) ) AS 
__madlib_temp_rel_state18952145_1619741694_43645987__
               ON TRUE
               JOIN ( SELECT unnest($3) AS 
__madlib_temp_col_grp_key91737014_1619741694_53757093__, unnest($4) AS 
__madlib_temp_col_n_tuples6669658_1619741694_1063457__ ) AS _rel_n_tuples
               ON TRUE
   
   
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "mlp_classification", line 33, in <module>
       grouping_col)
     PL/Python function "mlp_classification", line 42, in wrapper
     PL/Python function "mlp_classification", line 381, in mlp
     PL/Python function "mlp_classification", line 576, in update
   PL/Python function "mlp_classification"
   
   ```
   
   2. While running mlp with verbose set to true with either rmsprop or adam, 
we print the wrong loss value to the console i.e. it always prints the epsilon 
value instead of the loss value
   
   I had to check the output table to get the actual loss value
   
   ```
   SELECT madlib.mlp_classification(
       'iris_data_packed',
       'mlp_model',
       'independent_varname',
       'dependent_varname',
       ARRAY[5],
       'learning_rate_init=0.1, learning_rate_policy=constant, n_iterations=10, 
solver=rmsprop',
       'tanh',
       NULL,
       FALSE,
       TRUE
   );
   INFO:  Iteration: 1, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 2, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 3, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 4, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 5, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 6, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 7, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 8, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   INFO:  Iteration: 9, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
   
   madlib=# SELECT loss FROM mlp_model ;
          loss
   -------------------
    0.924061498034721
   (1 row)
   
   
   SELECT madlib.mlp_classification(
       'iris_data_packed',
       'mlp_model',
       'independent_varname',
       'dependent_varname',
       ARRAY[5],
       'learning_rate_init=0.1, learning_rate_policy=constant, n_iterations=10, 
solver=rmsprop, eps=0.5',
       'tanh',
       NULL,
       FALSE,
       TRUE
   );
   INFO:  Iteration: 1, Loss: <0.5>
   CONTEXT:  PL/Python function "mlp_classification"
    mlp_classification
   --------------------
   
   (1 row)
   ```
   
   I believe the wrong loss value also messes up the tolerance calculation. If 
I do not set tolerance, it only runs 1 iteration
   ```
   SELECT madlib.mlp_classification(
       'iris_data_packed',
       'mlp_model',
       'independent_varname',
       'dependent_varname',
       ARRAY[5],
       'learning_rate_init=0.1, learning_rate_policy=constant, n_iterations=10, 
solver=rmsprop',
       'tanh',
       NULL,
       FALSE,
       TRUE
   );
   INFO:  Iteration: 1, Loss: <1e-07>
   CONTEXT:  PL/Python function "mlp_classification"
    mlp_classification
   --------------------
   
   (1 row)
   ```
   
   3. This following bug already exists in master but if it's easy to fix I 
wonder if we should fix it with this PR
   After running mlp_classification, there is a leftover temp table with name 
something like `__madlib_temp_rel_state54055087_1619745107_11423546__`. The 
problem is that sometimes this causes the subsequent mlp queries to fail


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to