I'm using caret to assess classifier performance (and it's great!). However, I've found that my results differ between R2.* and R3.* - reported accuracies are reduced dramatically. I suspect that a code change to kernlab ksvm may be responsible (see version 5.16-24 here: http://cran.r-project.org/web/packages/caret/news.html). I get very different results between caret_5.15-61 + kernlab_0.9-17 and caret_5.17-7 + kernlab_0.9-19 (see below).
Can anyone please shed any light on this? Thanks very much! ### To replicate: require(repmis) # For downloading from https df <- source_data('https://dl.dropboxusercontent.com/u/47973221/data.csv', sep=',') require(caret) svm.m1 <- train(df[,-1],df[,1],method='svmRadial',metric='Kappa',tunelength=5,trControl=trainControl(method='repeatedcv', number=10, repeats=10, classProbs=TRUE)) svm.m1 sessionInfo() ### Results - R2.15.2 > svm.m1 1241 samples 7 predictors 10 classes: ‘O27479’, ‘O31403’, ‘O32057’, ‘O32059’, ‘O32060’, ‘O32078’, ‘O32089’, ‘O32663’, ‘O32668’, ‘O32676’ No pre-processing Resampling: Cross-Validation (10 fold, repeated 10 times) Summary of sample sizes: 1116, 1116, 1114, 1118, 1118, 1119, ... Resampling results across tuning parameters: C Accuracy Kappa Accuracy SD Kappa SD 0.25 0.684 0.63 0.0353 0.0416 0.5 0.729 0.685 0.0379 0.0445 1 0.756 0.716 0.0357 0.0418 Tuning parameter ‘sigma’ was held constant at a value of 0.247 Kappa was used to select the optimal model using the largest value. The final values used for the model were C = 1 and sigma = 0.247. > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] e1071_1.6-1 class_7.3-5 kernlab_0.9-17 repmis_0.2.4 caret_5.15-61 reshape2_1.2.2 plyr_1.8 lattice_0.20-10 foreach_1.4.0 cluster_1.14.3 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_2.15.2 digest_0.6.0 evaluate_0.4.3 formatR_0.7 grid_2.15.2 httr_0.2 iterators_1.0.6 knitr_1.1 RCurl_1.95-4.1 stringr_0.6.2 tools_2.15.2 ### Results - R3.0.2 > require(caret) > svm.m1 <- > train(df[,-1],df[,1],method=’svmRadial’,metric=’Kappa’,tunelength=5,trControl=trainControl(method=’repeatedcv’, > number=10, repeats=10, classProbs=TRUE)) Loading required package: class Warning messages: 1: closing unused connection 4 (https://dl.dropboxusercontent.com/u/47973221/df.Rdata) 2: executing %dopar% sequentially: no parallel backend registered > svm.m1 1241 samples 7 predictors 10 classes: ‘O27479’, ‘O31403’, ‘O32057’, ‘O32059’, ‘O32060’, ‘O32078’, ‘O32089’, ‘O32663’, ‘O32668’, ‘O32676’ No pre-processing Resampling: Cross-Validation (10 fold, repeated 10 times) Summary of sample sizes: 1118, 1117, 1115, 1117, 1116, 1118, ... Resampling results across tuning parameters: C Accuracy Kappa Accuracy SD Kappa SD 0.25 0.372 0.278 0.033 0.0371 0.5 0.39 0.297 0.0317 0.0358 1 0.399 0.307 0.0289 0.0323 Tuning parameter ‘sigma’ was held constant at a value of 0.2148907 Kappa was used to select the optimal model using the largest value. The final values used for the model were C = 1 and sigma = 0.215. > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] e1071_1.6-1 class_7.3-9 kernlab_0.9-19 repmis_0.2.6.2 caret_5.17-7 reshape2_1.2.2 plyr_1.8 lattice_0.20-24 foreach_1.4.1 cluster_1.14.4 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_3.0.2 digest_0.6.3 grid_3.0.2 httr_0.2 iterators_1.0.6 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.