Frank McQuillan created MADLIB-1250:
---------------------------------------

             Summary: Can't generate cross validation table for SVM
                 Key: MADLIB-1250
                 URL: https://issues.apache.org/jira/browse/MADLIB-1250
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Support Vector Machines
            Reporter: Frank McQuillan
             Fix For: v1.15



SVM does provide the CV:

1) The CV results table can be obtained by setting the validation_result 
variable in params parameter. This can be any arbitrary name, including 
<output_table>_cv.

2) The _summary table reports the best cross-validated parameter, which 
corresponds to the model in the output table. This gives the user the exact 
parameters to recreate the model. It's open for debate if that is the purpose 
of the summary table.

3) The docs are definitely missing examples for CV.

But there seems to be a bug:

{code}
DROP TABLE IF EXISTS houses;
CREATE TABLE houses (id INT, tax INT, bedroom INT, bath FLOAT, price INT,
            size INT, lot INT);
INSERT INTO houses VALUES   
  (1 ,  590 ,       2 ,    1 ,  50000 ,  770 , 22100),
  (2 , 1050 ,       3 ,    2 ,  85000 , 1410 , 12000),
  (3 ,   20 ,       3 ,    1 ,  22500 , 1060 ,  3500),
  (4 ,  870 ,       2 ,    2 ,  90000 , 1300 , 17500),
  (5 , 1320 ,       3 ,    2 , 133000 , 1500 , 30000),
  (6 , 1350 ,       2 ,    1 ,  90500 ,  820 , 25700),
  (7 , 2790 ,       3 ,  2.5 , 260000 , 2130 , 25000),
  (8 ,  680 ,       2 ,    1 , 142500 , 1170 , 22000),
  (9 , 1840 ,       3 ,    2 , 160000 , 1500 , 19000),
 (10 , 3680 ,       4 ,    2 , 240000 , 2790 , 20000),
 (11 , 1660 ,       3 ,    1 ,  87000 , 1030 , 17500),
 (12 , 1620 ,       3 ,    2 , 118600 , 1250 , 20000),
 (13 , 3100 ,       3 ,    2 , 140000 , 1760 , 38000),
 (14 , 2070 ,       2 ,    3 , 148000 , 1550 , 14000),
 (15 ,  650 ,       3 ,  1.5 ,  65000 , 1450 , 12000);
{code}

Run training with CV:

{code}
DROP TABLE IF EXISTS houses_svm_gaussian_regression, 
houses_svm_gaussian_regression_summary, houses_svm_gaussian_regression_random, 
houses_svm_gaussian_regression_cv;
SELECT madlib.svm_regression( 'houses',
                              'houses_svm_gaussian_regression',
                              'price',
                              'ARRAY[1, tax, bath, size]',
                              'gaussian',
                              'n_components=10',
                              '',
                              'init_stepsize=[0.01, 1], max_iter=200, 
validation_result=houses_svm_gaussian_regression_cv, n_folds=3'
                           );
SELECT * FROM houses_svm_gaussian_regression_cv;
{code}

Results in error:

{code}
InternalError: (psycopg2.InternalError) KeyError: 'params_dict' 
(plpython.c:4960)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "svm_regression", line 23, in <module>
    return svm.svm(**globals())
  PL/Python function "svm_regression", line 970, in svm
  PL/Python function "svm_regression", line 1033, in _cross_validate_svm
  PL/Python function "svm_regression", line 146, in output_tbl
PL/Python function "svm_regression"
 [SQL: "SELECT madlib.svm_regression( 'houses',\n                              
'houses_svm_gaussian_regression',\n                              'price',\n     
                         'ARRAY[1, tax, bath, size]',\n                         
     'gaussian',\n                              'n_components=10',\n            
                  '',\n                              'init_stepsize=[0.01, 1], 
max_iter=200, validation_result=houses_svm_gaussian_regression_cv, n_folds=3'\n 
                          );"]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to