Re: [R] SVM coefficients
Hi, A long time I have some problems to run a SVM - regression. Here an example with the Ozone data that represents very well my own data. data(Ozone, package = mlbench) #I cut the three first variables and splite the data in two parts Ozone- na.omit(Ozone[, -(1:3)]) index - 1:nrow(Ozone) testset - Ozone[104:203,] trainset - Ozone[1:103, ] names(Ozone) # library(e1071) # train svm with RBF kernel and without scale tuneobj = tune.svm(V4 ~ ., data = trainset, gamma = 10^(-6:-3), cost = 10^(1:3)) summary(tuneobj)$best.parameters svm.noscale - svm(V4 ~ ., data = trainset, cost = 1000, gamma = 0.001,scale=FALSE) Parameters: SVM-Type: eps-regression SVM-Kernel: radial cost: 1000 gamma: 0.001 epsilon: 0.1 Number of Support Vectors: 101 # I get 101 support vectors wich seems to be bad because I have 103 training observations. #When I test with the trainset I have good results but when I use the testset my prediction are pretty bad. pred.noscale1 - predict( svm.noscale, newdata=trainset, decision.values=T) crossprod(pred.noscale1 - trainset$V4)/103 #[1,] 0.009827706 pred.noscale2- predict( svm.noscale, newdata=testset, decision.values=T) crossprod(pred.noscale2 - testset$V4)/100 #[1,] 82.97046 # primal parameters w - t(svm.noscale$coefs) %*%svm.noscale$SV V5V6 V7 V8 V9 V10 V11 V12 V13 [1,] 44187.34 -265.8382 3741.839 6359.768 5455.063 -646352.6 317.6211 6456 -23256.67 b=svm.noscale$rho [1] -10.46065 #It seems that I have overfitting. I suppose that the problem comes from not use scale data #(V5 and V10 are very high). #Now scaling the data svm.scale - svm(V4 ~ ., data = trainset, cost = 1000, gamma = 0.001) Parameters: SVM-Type: eps-regression SVM-Kernel: radial cost: 1000 gamma: 0.001 epsilon: 0.1 Number of Support Vectors: 86 # It seems better svm.pred1 - predict( svm.scale, newdata=trainset, decision.values=T) crossprod( svm.pred1 - trainset$V4)/103 #[1,] 9.459279 svm.pred2 - predict( svm.scale, newdata=testset, decision.values=T) crossprod( svm.pred2 - testset$V4)/100 # 26.51138 # primal parameters w - t(svm.scale$coefs) %*%svm.scale$SV V5V6 V7 V8 V9 V10 V11 V12 V13 [1,] -89.03491 -22.88782 146.8991 56.09881 217.0120 43.01645 -8.27661 50.2729 -60.78473 b= svm.model$rho #[1] 18.42264 Looking only to prediction purpose the scale model is good but Im mainly interested in w. Is it possible to improve this model to get lower values to w? Actually Im trying to run the SVM-GARCH and one condition to the model is that the sum of ws 1 (in my model I have only two independent variables). If you have any idea how to improve the model or if you find any problem with it please let me now. Thanks in advance, Marlene. 2009/8/31 Noah Silverman n...@smartmediacorp.com Thanks, I just remember with RapidMiner, there was always a screen showing the effective weights assigned to each input variable by the SVM. These numbers themselves weren't good for much, except they really helped to visualize the data. It is rather useful to see how much relative weight (significance.) the SVM assigned to each variable. On 8/31/09 12:54 AM, Achim Zeileis wrote: On Mon, 31 Aug 2009, Noah Silverman wrote: Steve, That doesn't work. I just trained an SVM with 80 variables. svm_model$coefs gives me a list of 10,000 items. My training set is 30,000 examples of 80 variables, so I have no idea what the 10,000 items represent. Presumably, the coefficients of the support vectors times the training labels, see help(svm, package = e1071). See also http://www.jstatsoft.org/v15/i09/ for some background information and the different formulations available. There should be some attribute that lists the weights for each of the 80 variables. Not sure what you are looking for. Maybe David, the author auf svm() (and now Cc), can help. Z -- Noah On 8/30/09 7:47 PM, Steve Lianoglou wrote: Hi, On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com wrote: Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a coefficients item, but printing it returns an NULL result. Hmm .. I don't see a coefficients attribute, but rather a coefs attribute, which I guess is what you're looking for (?) Run example(svm) to its end and type: R m$coefs [,1] [1,] 1.00884130 [2,] 1.27446460 [3,] 2. [4,] -1. [5,] -0.35480340 [6,] -0.74043692 [7,]
Re: [R] SVM coefficients
Hi Marlene, I'm going to cut out much of your post and just cut to the chase: On Sep 1, 2009, at 9:03 AM, marlene marchena wrote: Looking only to prediction purpose the scale model is good but Im mainly interested in w. Is it possible to improve this model to get lower values to w? Actually Im trying to run the SVM-GARCH and one condition to the model is that the sum of ws 1 (in my model I have only two independent variables). If you have any idea how to improve the model or if you find any problem with it please let me now. In principle you should be able to do what you're after (of course :-), but I'm pretty sure you won't be able to do this using the e1071 package since you're imposing a linear constraint on w (this is almost like an l1 w/o using absolute vals of w's components, no?), while e1071::svm is solving a convex constraint (l2 on w). You say you're mainly interested in w, so are you looking for a means of doing feature selection? You can stick with e1071 and try doing recursive feature elimination (google it, you'll find mucho (aka SVM RFE)), or you can rig up an l1-svm which is already implemented for you in the penalized svm package (haven't used it myself): cran: http://cran.r-project.org/web/packages/penalizedSVM/index.html publication: http://bioinformatics.oxfordjournals.org/cgi/content/full/25/13/1711 Does that help? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM coefficients
Steve, That doesn't work. I just trained an SVM with 80 variables. svm_model$coefs gives me a list of 10,000 items. My training set is 30,000 examples of 80 variables, so I have no idea what the 10,000 items represent. There should be some attribute that lists the weights for each of the 80 variables. -- Noah On 8/30/09 7:47 PM, Steve Lianoglou wrote: Hi, On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com wrote: Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a coefficients item, but printing it returns an NULL result. Hmm .. I don't see a coefficients attribute, but rather a coefs attribute, which I guess is what you're looking for (?) Run example(svm) to its end and type: R m$coefs [,1] [1,] 1.00884130 [2,] 1.27446460 [3,] 2. [4,] -1. [5,] -0.35480340 [6,] -0.74043692 [7,] -0.87635311 [8,] -0.04857869 [9,] -0.03721980 [10,] -0.64696793 [11,] -0.57894605 HTH, -steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM coefficients
On Mon, 31 Aug 2009, Noah Silverman wrote: Steve, That doesn't work. I just trained an SVM with 80 variables. svm_model$coefs gives me a list of 10,000 items. My training set is 30,000 examples of 80 variables, so I have no idea what the 10,000 items represent. Presumably, the coefficients of the support vectors times the training labels, see help(svm, package = e1071). See also http://www.jstatsoft.org/v15/i09/ for some background information and the different formulations available. There should be some attribute that lists the weights for each of the 80 variables. Not sure what you are looking for. Maybe David, the author auf svm() (and now Cc), can help. Z -- Noah On 8/30/09 7:47 PM, Steve Lianoglou wrote: Hi, On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com wrote: Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a coefficients item, but printing it returns an NULL result. Hmm .. I don't see a coefficients attribute, but rather a coefs attribute, which I guess is what you're looking for (?) Run example(svm) to its end and type: R m$coefs [,1] [1,] 1.00884130 [2,] 1.27446460 [3,] 2. [4,] -1. [5,] -0.35480340 [6,] -0.74043692 [7,] -0.87635311 [8,] -0.04857869 [9,] -0.03721980 [10,] -0.64696793 [11,] -0.57894605 HTH, -steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM coefficients
Noah Silverman wrote: Steve, That doesn't work. I just trained an SVM with 80 variables. svm_model$coefs gives me a list of 10,000 items. My training set is 30,000 examples of 80 variables, so I have no idea what the 10,000 items represent. There should be some attribute that lists the weights for each of the 80 variables. Hi Noah, does this help? # make binary problem from iris mydata - iris[1:100,] mydata$Species - mydata$Species[,drop=T] str(mydata) #'data.frame': 100 obs. of 5 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... # $ Species : Factor w/ 2 levels setosa,versicolor: 1 1 1 1 1 1 1 1 1 1 ... # inputs X - as.matrix(mydata[,-5]) # train svm with linear kernel, # to make later stuff easier we dont scale m - svm(Species~., data=mydata, kernel=linear, scale=F) # # Number of Support Vectors: 3 # we get 3 support vectors, these are weights for training cases # or in svm therory speak: our dual variables alpha m$coefs[,1] # [1] 0.67122500 0.07671148 -0.74793648 # these are the indices of the cases to which the alphas belong m$index # [1] 24 42 99 # lets calculate the primary vars from the dual ones # svm theory says # w = sum x_i alpha_i w - t(m$coefs) %*% X[m$index,] #Sepal.Length Sepal.Width Petal.Length Petal.Width # [1,] -0.04602689 0.5216377-1.003002 -0.4641042 # test whether the above was nonsense. # e1071 predict p1 - predict(m, newdata=mydata, decision.values=T) p1 - attr(p1, decision.values) # do it manually with w, simple linear predictor with intercept -m$rho p2 - X %*% t(w) - m$rho # puuuh, lucky max(abs(p1 - p2)) # [1] 6.439294e-15 Bernd __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM coefficients
Hi, On Aug 31, 2009, at 3:32 AM, Noah Silverman wrote: Steve, That doesn't work. Actually, it does :-) I just trained an SVM with 80 variables. svm_model$coefs gives me a list of 10,000 items. My training set is 30,000 examples of 80 variables, so I have no idea what the 10,000 items represent. There should be some attribute that lists the weights for each of the 80 variables. No, not really. The coefficients that you're pulling out are the weights for the support vectors. These aren't the coefficients you're expecting as in the normal linear model case, or whatever. I guess you're using the RBF kernel, right? The 80 variables that you're using are being transformed into some higher dimensional space, so the 80 weights you expect to get back don't really exist in the way you're expecting. SVMs are great for accuracy, but notoriously hard for interpretation. To try and squeeze some interpretability from your classifier in your feature space, you might try to look at the weights over your w vector: http://www.nabble.com/How-to-get-w-and-b-in-SVR--%28package-e1071%29-td24790413.html#a24791423 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM coefficients
Thanks, I just remember with RapidMiner, there was always a screen showing the effective weights assigned to each input variable by the SVM. These numbers themselves weren't good for much, except they really helped to visualize the data. It is rather useful to see how much relative weight (significance.) the SVM assigned to each variable. On 8/31/09 12:54 AM, Achim Zeileis wrote: On Mon, 31 Aug 2009, Noah Silverman wrote: Steve, That doesn't work. I just trained an SVM with 80 variables. svm_model$coefs gives me a list of 10,000 items. My training set is 30,000 examples of 80 variables, so I have no idea what the 10,000 items represent. Presumably, the coefficients of the support vectors times the training labels, see help(svm, package = e1071). See also http://www.jstatsoft.org/v15/i09/ for some background information and the different formulations available. There should be some attribute that lists the weights for each of the 80 variables. Not sure what you are looking for. Maybe David, the author auf svm() (and now Cc), can help. Z -- Noah On 8/30/09 7:47 PM, Steve Lianoglou wrote: Hi, On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com wrote: Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a coefficients item, but printing it returns an NULL result. Hmm .. I don't see a coefficients attribute, but rather a coefs attribute, which I guess is what you're looking for (?) Run example(svm) to its end and type: R m$coefs [,1] [1,] 1.00884130 [2,] 1.27446460 [3,] 2. [4,] -1. [5,] -0.35480340 [6,] -0.74043692 [7,] -0.87635311 [8,] -0.04857869 [9,] -0.03721980 [10,] -0.64696793 [11,] -0.57894605 HTH, -steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM coefficients
Hi, On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com wrote: Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a coefficients item, but printing it returns an NULL result. Hmm .. I don't see a coefficients attribute, but rather a coefs attribute, which I guess is what you're looking for (?) Run example(svm) to its end and type: R m$coefs [,1] [1,] 1.00884130 [2,] 1.27446460 [3,] 2. [4,] -1. [5,] -0.35480340 [6,] -0.74043692 [7,] -0.87635311 [8,] -0.04857869 [9,] -0.03721980 [10,] -0.64696793 [11,] -0.57894605 HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.