[
https://issues.apache.org/jira/browse/MADLIB-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1364:
------------------------------------
Description:
(1)
confusing error message if forgot to preprocess source table
{code}
SELECT madlib.madlib_keras_fit('train_lt5', -- source table (NOT
PREPROCESSED)
'mnist_model', -- model output table
'model_arch_library', -- model arch table
1, -- model arch id
$$ loss='categorical_crossentropy',
optimizer='adadelta', metrics=['accuracy']$$, -- compile_params
$$ batch_size=batch_size, epochs=1 $$, --
fit_params
5, -- num_iterations
0, -- gpus_per_host
'test_lt5_packed', -- validation table
1 --
metrics_compute_frequency
);
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error:
Input table 'train_lt5_summary' does not exist (plpython.c:5038)
{code}
A better message would be:
{code}
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error:
Input table 'train_lt5_summary' does not exist. Please ensure that the source
table you specify has been preprocessed by the image preprocessor.
(plpython.c:5038)
{code}
(2)
confusing error message if forgot to preprocess validation table
{code}
SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table
(YES PREPROCESSED)
'mnist_model', -- model output table
'model_arch_library', -- model arch table
1, -- model arch id
$$ loss='categorical_crossentropy',
optimizer='adadelta', metrics=['accuracy']$$, -- compile_params
$$ batch_size=batch_size, epochs=1 $$, --
fit_params
5, -- num_iterations
0, -- gpus_per_host
'test_lt5', -- validation table (NOT
PREPROCESSED)
1 --
metrics_compute_frequency
);
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid
independent_varname ('independent_var') for table (test_lt5). (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit", line 21, in <module>
madlib_keras.fit(**globals())
PL/Python function "madlib_keras_fit", line 42, in wrapper
PL/Python function "madlib_keras_fit", line 71, in fit
PL/Python function "madlib_keras_fit", line 233, in __init__
PL/Python function "madlib_keras_fit", line 274, in _validate_input_args
PL/Python function "madlib_keras_fit", line 288, in _validate_validation_table
PL/Python function "madlib_keras_fit", line 242, in _validate_input_table
PL/Python function "madlib_keras_fit", line 96, in _assert
PL/Python function "madlib_keras_fit"
[SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source
table\n 'mnist_model', -- model output
table\n 'model_arch_library', -- model arch
table\n 1, -- model arch id\n
$$ loss='categorical_crossentropy',
optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n
$$ batch_size=batch_size, epochs=1 $$, -- fit_params\n
5, -- num_iterations\n
0, -- gpus_per_host\n
'test_lt5', -- validation table\n
1 -- metrics_compute_frequency\n
);"]
{code}
A better message would be:
{code}
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid
independent_varname ('independent_var') for table (test_lt5). Please ensure
that this table has been preprocessed by the image preprocessor.
(plpython.c:5038)
{code}
was:
(1)
input shape checking
We added input shape checking which is a good idea in principle, but it seems
to be too restrictive. e.g., for the mnist data set, Keras input shape is:
{code}
x_train_lt5.shape
(30596, 28, 28)
{code}
In Madlib before preprocessing we get:
{code}
id | 2238
x |
{{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,196,195,12,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,79,159,44,0,0,0,0,39,253,218,10,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,179,0,0,0,0,149,253,169,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,53,0,0,0,12,222,253,123,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,8,226,253,16,0,0,0,25,253,253,56,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,50,253,253,16,0,0,0,41,253,218,7,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,139,253,217,8,0,0,0,126,253,193,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,213,253,114,0,0,0,10,226,253,130,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,39,250,253,223,10,0,0,17,253,253,54,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,173,253,253,253,169,137,83,120,253,221,2,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,52,238,254,254,254,254,254,255,254,254,192,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,115,253,228,84,73,97,154,238,253,253,138,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,40,146,45,0,0,0,0,9,253,250,73,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,132,253,186,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,243,253,102,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,196,254,238,7,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,245,254,186,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,166,251,79,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}
y | 4
{code}
A validation error gets thrown when we run fit():
{code}
InternalError: (psycopg2.InternalError) plpy.Error: model_keras error: Input
shape [28, 28, 1] in the model architecture does not match the input shape [28,
28, None] of column independent_var in table train_lt5_packed. (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit", line 21, in <module>
madlib_keras.fit(**globals())
PL/Python function "madlib_keras_fit", line 42, in wrapper
PL/Python function "madlib_keras_fit", line 102, in fit
PL/Python function "madlib_keras_fit", line 300, in validate_input_shapes
PL/Python function "madlib_keras_fit", line 86, in _validate_input_shapes
PL/Python function "madlib_keras_fit"
[SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source
table\n 'mnist_model', -- model output
table\n 'model_arch_library', -- model arch
table\n 1, -- model arch id\n
$$ loss='categorical_crossentropy',
optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n
$$ batch_size=batch_size, epochs=1 $$, -- fit_params\n
5, -- num_iterations\n
0, -- gpus_per_host\n
'test_lt5_packed', -- validation table\n
1 -- metrics_compute_frequency\n
);"]
{code}
which is too restrictive. I suggest we turn madlib input shape validation off
for the time being and let the back end fail or not according to its rules.
This applies to fit, evaluate and predict.
(2)
confusing error message if forgot to preprocess source table
{code}
SELECT madlib.madlib_keras_fit('train_lt5', -- source table (NOT
PREPROCESSED)
'mnist_model', -- model output table
'model_arch_library', -- model arch table
1, -- model arch id
$$ loss='categorical_crossentropy',
optimizer='adadelta', metrics=['accuracy']$$, -- compile_params
$$ batch_size=batch_size, epochs=1 $$, --
fit_params
5, -- num_iterations
0, -- gpus_per_host
'test_lt5_packed', -- validation table
1 --
metrics_compute_frequency
);
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error:
Input table 'train_lt5_summary' does not exist (plpython.c:5038)
{code}
A better message would be:
{code}
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error:
Input table 'train_lt5_summary' does not exist. Please ensure that the source
table you specify has been preprocessed by the image preprocessor.
(plpython.c:5038)
{code}
(3)
confusing error message if forgot to preprocess validation table
{code}
SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table
(YES PREPROCESSED)
'mnist_model', -- model output table
'model_arch_library', -- model arch table
1, -- model arch id
$$ loss='categorical_crossentropy',
optimizer='adadelta', metrics=['accuracy']$$, -- compile_params
$$ batch_size=batch_size, epochs=1 $$, --
fit_params
5, -- num_iterations
0, -- gpus_per_host
'test_lt5', -- validation table (NOT
PREPROCESSED)
1 --
metrics_compute_frequency
);
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid
independent_varname ('independent_var') for table (test_lt5). (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit", line 21, in <module>
madlib_keras.fit(**globals())
PL/Python function "madlib_keras_fit", line 42, in wrapper
PL/Python function "madlib_keras_fit", line 71, in fit
PL/Python function "madlib_keras_fit", line 233, in __init__
PL/Python function "madlib_keras_fit", line 274, in _validate_input_args
PL/Python function "madlib_keras_fit", line 288, in _validate_validation_table
PL/Python function "madlib_keras_fit", line 242, in _validate_input_table
PL/Python function "madlib_keras_fit", line 96, in _assert
PL/Python function "madlib_keras_fit"
[SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source
table\n 'mnist_model', -- model output
table\n 'model_arch_library', -- model arch
table\n 1, -- model arch id\n
$$ loss='categorical_crossentropy',
optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n
$$ batch_size=batch_size, epochs=1 $$, -- fit_params\n
5, -- num_iterations\n
0, -- gpus_per_host\n
'test_lt5', -- validation table\n
1 -- metrics_compute_frequency\n
);"]
{code}
A better message would be:
{code}
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid
independent_varname ('independent_var') for table (test_lt5). Please ensure
that this table has been preprocessed by the image preprocessor.
(plpython.c:5038)
{code}
> Misc message and other items for 1.16 release
> ---------------------------------------------
>
> Key: MADLIB-1364
> URL: https://issues.apache.org/jira/browse/MADLIB-1364
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Deep Learning
> Reporter: Frank McQuillan
> Assignee: Nikhil
> Priority: Minor
> Fix For: v1.16
>
>
> (1)
> confusing error message if forgot to preprocess source table
> {code}
> SELECT madlib.madlib_keras_fit('train_lt5', -- source table (NOT
> PREPROCESSED)
> 'mnist_model', -- model output table
> 'model_arch_library', -- model arch table
> 1, -- model arch id
> $$ loss='categorical_crossentropy',
> optimizer='adadelta', metrics=['accuracy']$$, -- compile_params
> $$ batch_size=batch_size, epochs=1 $$, --
> fit_params
> 5, -- num_iterations
> 0, -- gpus_per_host
> 'test_lt5_packed', -- validation
> table
> 1 --
> metrics_compute_frequency
> );
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error:
> Input table 'train_lt5_summary' does not exist (plpython.c:5038)
> {code}
> A better message would be:
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error:
> Input table 'train_lt5_summary' does not exist. Please ensure that the
> source table you specify has been preprocessed by the image preprocessor.
> (plpython.c:5038)
> {code}
> (2)
> confusing error message if forgot to preprocess validation table
> {code}
> SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table
> (YES PREPROCESSED)
> 'mnist_model', -- model output table
> 'model_arch_library', -- model arch table
> 1, -- model arch id
> $$ loss='categorical_crossentropy',
> optimizer='adadelta', metrics=['accuracy']$$, -- compile_params
> $$ batch_size=batch_size, epochs=1 $$, --
> fit_params
> 5, -- num_iterations
> 0, -- gpus_per_host
> 'test_lt5', -- validation table
> (NOT PREPROCESSED)
> 1 --
> metrics_compute_frequency
> );
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid
> independent_varname ('independent_var') for table (test_lt5).
> (plpython.c:5038)
> CONTEXT: Traceback (most recent call last):
> PL/Python function "madlib_keras_fit", line 21, in <module>
> madlib_keras.fit(**globals())
> PL/Python function "madlib_keras_fit", line 42, in wrapper
> PL/Python function "madlib_keras_fit", line 71, in fit
> PL/Python function "madlib_keras_fit", line 233, in __init__
> PL/Python function "madlib_keras_fit", line 274, in _validate_input_args
> PL/Python function "madlib_keras_fit", line 288, in
> _validate_validation_table
> PL/Python function "madlib_keras_fit", line 242, in _validate_input_table
> PL/Python function "madlib_keras_fit", line 96, in _assert
> PL/Python function "madlib_keras_fit"
> [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', --
> source table\n 'mnist_model', -- model
> output table\n 'model_arch_library', -- model
> arch table\n 1, -- model
> arch id\n $$ loss='categorical_crossentropy',
> optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n
> $$ batch_size=batch_size, epochs=1 $$, -- fit_params\n
> 5, -- num_iterations\n
> 0, -- gpus_per_host\n
> 'test_lt5', -- validation table\n
> 1 -- metrics_compute_frequency\n
> );"]
> {code}
> A better message would be:
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid
> independent_varname ('independent_var') for table (test_lt5). Please ensure
> that this table has been preprocessed by the image preprocessor.
> (plpython.c:5038)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)