kaknikhil edited a comment on pull request #564: URL: https://github.com/apache/madlib/pull/564#issuecomment-829732029
Consolidating all the testing comments in one place 1. Calling non minibatched data with adam or rmsprop does not fail gracefully. We should throw a helpful error message ``` SELECT madlib.mlp_classification( 'iris_data', 'mlp_model', 'attributes','class_text',ARRAY[5], 'learning_rate_init=1.5, learning_rate_policy=constant, n_iterations=50, tolerance=0, rho=0.9, solver=rmsprop','tanh', NULL, FALSE, TRUE); ERROR: spiexceptions.UndefinedColumn: column "attributes" does not exist LINE 8: (attributes)::DOUBLE PRECISION[]... ^ SELECT array_to_string(ARRAY[Null], ',') AS __madlib_temp_col_grp_key91737014_1619741694_53757093__, NULL, 1 AS __madlib_temp_col_grp_iteration76439383_1619741694_35069372__, ( madlib.mlp_alr_step( (attributes)::DOUBLE PRECISION[], (ARRAY[(__madlib_temp_dep_var_norm96411952_1619741693_22150079__) = 'Iris-setosa', (__madlib_temp_dep_var_norm96411952_1619741693_22150079__) = 'Iris-versicolor', (__madlib_temp_dep_var_norm96411952_1619741693_22150079__) = 'Iris-virginica']::INTEGER[])::DOUBLE PRECISION[], __madlib_temp_rel_state18952145_1619741694_43645987__.__madlib_temp_col_grp_state22584915_1619741694_5254854__, ARRAY[ 4,40,3 ]::DOUBLE PRECISION[], (1.5)::FLOAT8, 2, 1, (1)::DOUBLE PRECISION, (NULL::DOUBLE PRECISION[])::DOUBLE PRECISION[], 0, 1::integer, 1::integer, 1::integer, 0.9::FLOAT8, 0.9::FLOAT8, 0.999::FLOAT8, 1e-07::FLOAT8 ) ) AS __madlib_temp_col_grp_state22584915_1619741694_5254854__ FROM ( SELECT *, array_to_string(ARRAY[Null], ',') AS __madlib_temp_col_grp_key91737014_1619741694_53757093__ FROM __madlib_temp_tbl_data_scaled39150522_1619741693_1119413__ ) AS _src JOIN ( SELECT grp_key AS __madlib_temp_col_grp_key91737014_1619741694_53757093__,state AS __madlib_temp_col_grp_state22584915_1619741694_5254854__ FROM madlib._gen_state($1, NULL, $2) ) AS __madlib_temp_rel_state18952145_1619741694_43645987__ ON TRUE JOIN ( SELECT unnest($3) AS __madlib_temp_col_grp_key91737014_1619741694_53757093__, unnest($4) AS __madlib_temp_col_n_tuples6669658_1619741694_1063457__ ) AS _rel_n_tuples ON TRUE CONTEXT: Traceback (most recent call last): PL/Python function "mlp_classification", line 33, in <module> grouping_col) PL/Python function "mlp_classification", line 42, in wrapper PL/Python function "mlp_classification", line 381, in mlp PL/Python function "mlp_classification", line 576, in update PL/Python function "mlp_classification" ``` 2. While running mlp with verbose set to true with either rmsprop or adam, we print the wrong loss value to the console i.e. it always prints the epsilon value instead of the loss value I had to check the output table to get the actual loss value ``` SELECT madlib.mlp_classification( 'iris_data_packed', 'mlp_model', 'independent_varname', 'dependent_varname', ARRAY[5], 'learning_rate_init=0.1, learning_rate_policy=constant, n_iterations=10, solver=rmsprop', 'tanh', NULL, FALSE, TRUE ); INFO: Iteration: 1, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 2, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 3, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 4, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 5, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 6, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 7, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 8, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" INFO: Iteration: 9, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" madlib=# SELECT loss FROM mlp_model ; loss ------------------- 0.924061498034721 (1 row) SELECT madlib.mlp_classification( 'iris_data_packed', 'mlp_model', 'independent_varname', 'dependent_varname', ARRAY[5], 'learning_rate_init=0.1, learning_rate_policy=constant, n_iterations=10, solver=rmsprop, eps=0.5', 'tanh', NULL, FALSE, TRUE ); INFO: Iteration: 1, Loss: <0.5> CONTEXT: PL/Python function "mlp_classification" mlp_classification -------------------- (1 row) ``` I believe the wrong loss value also messes up the tolerance calculation. If I do not set tolerance, it only runs 1 iteration ``` SELECT madlib.mlp_classification( 'iris_data_packed', 'mlp_model', 'independent_varname', 'dependent_varname', ARRAY[5], 'learning_rate_init=0.1, learning_rate_policy=constant, n_iterations=10, solver=rmsprop', 'tanh', NULL, FALSE, TRUE ); INFO: Iteration: 1, Loss: <1e-07> CONTEXT: PL/Python function "mlp_classification" mlp_classification -------------------- (1 row) ``` 3. This following bug already exists in master but if it's easy to fix I wonder if we should fix it with this PR After running mlp_classification, there is a leftover temp table with name something like `__madlib_temp_rel_state54055087_1619745107_11423546__`. The problem is that sometimes this causes the subsequent mlp queries to fail 4. Lets say the user chooses 'adam' as the solver, should we still populate rho as 0.9 in the summary table or should it be NULL ? Same applies to other params as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org