[
https://issues.apache.org/jira/browse/MADLIB-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan closed MADLIB-605.
----------------------------------
Resolution: Fixed
Resolved by writing new SVM for scratch for v1.9
Closing this JIRA.
> SVM Regression: Accurancy should be improved for some data sets
> ---------------------------------------------------------------
>
> Key: MADLIB-605
> URL: https://issues.apache.org/jira/browse/MADLIB-605
> Project: Apache MADlib
> Issue Type: Bug
> Reporter: Jiali Yao
> Assignee: Rahul Iyer
> Priority: Critical
> Labels: severity_set
> Fix For: v1.9
>
>
> We run comparable test cases both in MADlib and libsvm and compared mean
> square error.
> We found that below data sets have worse score than svm
> 1. Kernel function is dot:
> {code}
> Data Sets MADlib(Parallel = true) MADlib(Parallel = false) libsvm
> Madlib/libsvm
> bodyfat 249962.1847 4397613.616
> 4.68E-05 5336898594
> mpg 239380954.3 1.89706E+11 22.5239
> 10627864.37
> Test case:
> SELECT madlib.svm_regression
> ( 'madlibtestdata.svm_bodyfat'::text --input_table
> , 'madlibtestresult.reg_model_table'::text
> --model_table
> , 'true'::boolean --parallel
> , 'madlib.svm_dot'::text --kernel_func
> , 'false'::boolean --verbose
> , '0.1'::float8 --eta
> , '0.005'::float8 --nu
> , '0.05'::float8 --slambda
> ) AS q;
> SELECT madlib.svm_regression
> ( 'madlibtestdata.svm_mpg'::text --input_table
> , 'madlibtestresult.reg_model_table'::text
> --model_table
> , 'true'::boolean --parallel
> , 'madlib.svm_dot'::text --kernel_func
> , 'false'::boolean --verbose
> , '0.1'::float8 --eta
> , '0.005'::float8 --nu
> , '0.05'::float8 --slambda
> ) AS q;
> {code}
> 2. Polynomial
> {code}
> Data Sets MADlib(Parallel = true) MADlib(Parallel = false) libsvm
> Madlib/libsvm
> bodyfat 4.07E+26 1.86E+27 0.00143458
> 2.83446E+29
> cpusmall 2.38E+71 4.41E+72 1.42E+42 1.67986E+29
> housing 9.31E+29 6.79E+31 249267 3.73671E+24
> mpg 2.25E+37 9.89E+39 610.474 3.68346E+34
> Test case example:
> SELECT madlib.svm_regression
> ( 'madlibtestdata.svm_bodyfat'::text --input_table
> , 'madlibtestresult.reg_model_table'::text
> --model_table
> , 'true'::boolean --parallel
> , 'madlibtestdata.svm_polynomial'::text
> --kernel_func
> , 'false'::boolean --verbose
> , '0.1'::float8 --eta
> , '0.005'::float8 --nu
> , '0.05'::float8 --slambda
> ) AS q;
> {code}
> 3. Data sets
> {code}
> Data Sets Name TrainSize Attr URL
> abalone 4,177 8
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#abalone
>
> bodyfat 252 14
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#bodyfat
>
> cpusmall 8,192 12
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#cpusmalll
>
> housing 506 13
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#housing
>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)