fmcquillan99 edited a comment on issue #459: DL: Add support for asymmetric segment distribution to preprocessor URL: https://github.com/apache/madlib/pull/459#issuecomment-557749546 Please review tests (0) and (5) which I think may need some changes. (0) I think `gpu_config` in the output table should be changed to `distribution_rules` to match the name of the input parameter. (1) CPUs only ``` DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary; SELECT madlib.training_preprocessor_dl('image_data', -- Source table 'image_data_packed', -- Output table 'species', -- Dependent variable 'rgb', -- Independent variable NULL, -- Buffer size 255, -- Normalizing constant NULL, 'all_segments' ); SELECT * FROM image_data_packed_summary; -[ RECORD 1 ]-------+------------------ source_table | image_data output_table | image_data_packed dependent_varname | species independent_varname | rgb dependent_vartype | text class_values | {bird,cat,dog} buffer_size | 26 normalizing_const | 255 num_classes | 3 gpu_config | all_segments ``` OK (2) table `xxx` exists but has wrong format for distribution table ``` DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary; SELECT madlib.training_preprocessor_dl('image_data', -- Source table 'image_data_packed', -- Output table 'species', -- Dependent variable 'rgb', -- Independent variable NULL, -- Buffer size 255, -- Normalizing constant NULL, 'xxx' ); ERROR: plpy.Error: training_preprocessor_dl: segments_to_use table must contain dbib column (plpython.c:5038) CONTEXT: Traceback (most recent call last): PL/Python function "training_preprocessor_dl", line 24, in <module> training_preprocessor_obj.training_preprocessor_dl() PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl PL/Python function "training_preprocessor_dl", line 271, in input_preprocessor_dl PL/Python function "training_preprocessor_dl", line 96, in _assert PL/Python function "training_preprocessor_dl" ``` OK (3) distribution table `yyy` does not exit ``` DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary; SELECT madlib.training_preprocessor_dl('image_data', -- Source table 'image_data_packed', -- Output table 'species', -- Dependent variable 'rgb', -- Independent variable NULL, -- Buffer size 255, -- Normalizing constant NULL, 'yyy' ); ERROR: plpy.Error: training_preprocessor_dl error: Input table 'yyy' does not exist. (plpython.c:5038) DETAIL: segments_to_use table (yyy) doesn't exist. CONTEXT: Traceback (most recent call last): PL/Python function "training_preprocessor_dl", line 24, in <module> training_preprocessor_obj.training_preprocessor_dl() PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl PL/Python function "training_preprocessor_dl", line 269, in input_preprocessor_dl PL/Python function "training_preprocessor_dl", line 674, in input_tbl_valid PL/Python function "training_preprocessor_dl" ``` OK (4) Ask for GPUs but no GPUs on cluster ``` DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary; SELECT madlib.training_preprocessor_dl('image_data', -- Source table 'image_data_packed', -- Output table 'species', -- Dependent variable 'rgb', -- Independent variable NULL, -- Buffer size 255, -- Normalizing constant NULL, 'gpu_segments' ); ERROR: plpy.Error: training_preprocessor_dl: No GPUs configured on hosts. (plpython.c:5038) CONTEXT: Traceback (most recent call last): PL/Python function "training_preprocessor_dl", line 24, in <module> training_preprocessor_obj.training_preprocessor_dl() PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl PL/Python function "training_preprocessor_dl", line 243, in input_preprocessor_dl PL/Python function "training_preprocessor_dl" ``` OK (5) Valid distribution table ``` DROP TABLE IF EXISTS segments_to_use; CREATE TABLE segments_to_use AS SELECT DISTINCT dbid, hostname FROM gp_segment_configuration WHERE role='p' AND content>=0; SELECT * FROM segments_to_use ORDER BY hostname, dbid; dbid | hostname ------+----------------------- 2 | pm-demo-machine-keras 3 | pm-demo-machine-keras (2 rows) DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary; SELECT madlib.training_preprocessor_dl('image_data', -- Source table 'image_data_packed', -- Output table 'species', -- Dependent variable 'rgb', -- Independent variable NULL, -- Buffer size 255, -- Normalizing constant NULL, 'segments_to_use' ); SELECT * FROM image_data_packed_summary; -[ RECORD 1 ]-------+------------------ source_table | image_data output_table | image_data_packed dependent_varname | species independent_varname | rgb dependent_vartype | text class_values | {bird,cat,dog} buffer_size | 26 normalizing_const | 255 num_classes | 3 gpu_config | {0,1} ``` The field `gpu_config` says `{0,1}` which does not match `dbid` of `{2,3}`. What is the `{0,1}` from? I think we should report out `dbid` or else the user might get confused.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
