Re: [R] SVM coefficients

2009-09-01 Thread marlene marchena
Hi,



A long time I have some problems to run a SVM - regression. Here an example
with the Ozone data that represents very well my own data.



 data(Ozone, package = mlbench)

#I cut the three first variables and splite the data in two parts

Ozone- na.omit(Ozone[, -(1:3)])

 index - 1:nrow(Ozone)

 testset - Ozone[104:203,]

trainset - Ozone[1:103, ]

names(Ozone)



# library(e1071)

# train svm with RBF kernel and without scale

tuneobj = tune.svm(V4 ~ ., data = trainset, gamma = 10^(-6:-3), cost =
10^(1:3))

 summary(tuneobj)$best.parameters

 svm.noscale - svm(V4 ~ ., data = trainset, cost = 1000, gamma =
0.001,scale=FALSE)



Parameters:

   SVM-Type:  eps-regression

 SVM-Kernel:  radial

   cost:  1000

  gamma:  0.001

epsilon:  0.1





Number of Support Vectors:  101



# I get 101 support vectors wich seems to be bad because I have 103 training
observations.

#When I test with the trainset I have good results but when I use the
testset my prediction are pretty bad.



pred.noscale1 - predict( svm.noscale, newdata=trainset, decision.values=T)

crossprod(pred.noscale1 -  trainset$V4)/103

  #[1,] 0.009827706



pred.noscale2- predict( svm.noscale, newdata=testset, decision.values=T)

crossprod(pred.noscale2 -  testset$V4)/100

 #[1,] 82.97046





# primal parameters

w - t(svm.noscale$coefs) %*%svm.noscale$SV



   V5V6   V7   V8   V9   V10  V11
V12
V13

[1,] 44187.34 -265.8382 3741.839 6359.768 5455.063 -646352.6 317.6211 6456
-23256.67

b=svm.noscale$rho

[1] -10.46065



#It seems that I have overfitting. I suppose that the problem comes from not
use scale data #(V5 and V10 are very high).

#Now scaling the data



 svm.scale - svm(V4 ~ ., data = trainset, cost = 1000, gamma = 0.001)



Parameters:

   SVM-Type:  eps-regression

 SVM-Kernel:  radial

   cost:  1000

  gamma:  0.001

epsilon:  0.1



Number of Support Vectors:  86



# It seems better



svm.pred1 - predict( svm.scale, newdata=trainset, decision.values=T)

 crossprod( svm.pred1 -  trainset$V4)/103

 #[1,] 9.459279



 svm.pred2 - predict( svm.scale, newdata=testset, decision.values=T)

 crossprod( svm.pred2 - testset$V4)/100

#  26.51138





# primal parameters

  w - t(svm.scale$coefs) %*%svm.scale$SV



V5V6   V7   V8   V9  V10  V11
V12   V13

[1,] -89.03491 -22.88782 146.8991 56.09881 217.0120 43.01645 -8.27661
50.2729 -60.78473



 b= svm.model$rho

#[1] 18.42264



Looking only to prediction purpose the scale model is good but I’m mainly
interested in w. Is it possible to improve this model to get lower values to
w? Actually I’m trying to run the SVM-GARCH and one condition to the model
is that the sum of

w’s 1 (in my model I have only two independent variables).



If you have any idea how to improve the model or if you find any problem
with it please let me now.



Thanks in advance,


Marlene.



2009/8/31 Noah Silverman n...@smartmediacorp.com

 Thanks,

 I just remember with RapidMiner, there was always a screen showing the
 effective weights assigned to each input variable by the SVM.  These
 numbers themselves weren't good for much, except they really helped to
 visualize the data.  It is rather useful to see how much relative weight
 (significance.) the SVM assigned to each variable.


 On 8/31/09 12:54 AM, Achim Zeileis wrote:
  On Mon, 31 Aug 2009, Noah Silverman wrote:
 
  Steve,
 
  That doesn't work.
 
  I just trained an SVM with 80 variables.
  svm_model$coefs gives me  a list of 10,000 items.  My training set is
  30,000 examples of 80 variables, so I have no idea what the 10,000
  items represent.
 
  Presumably, the coefficients of the support vectors times the training
  labels, see help(svm, package = e1071). See also
http://www.jstatsoft.org/v15/i09/
  for some background information and the different formulations available.
 
  There should be some attribute that lists the weights for each of
  the 80 variables.
 
  Not sure what you are looking for. Maybe David, the author auf svm()
  (and now Cc), can help.
  Z
 
  --
  Noah
 
  On 8/30/09 7:47 PM, Steve Lianoglou wrote:
  Hi,
 
  On Sun, Aug 30, 2009 at 6:10 PM, Noah
  Silvermann...@smartmediacorp.com wrote:
 
  Hello,
 
  I'm using the svm function from the e1071 package.
 
  It works well and gives me nice results.
 
  I'm very curious to see the actual coefficients calculated for each
  input
  variable.  (Other packages, like RapidMiner, show you this
  automatically.)
 
  I've tried looking at attributes for the model and do see a
  coefficients
  item, but printing it returns an NULL result.
 
  Hmm .. I don't see a coefficients attribute, but rather a coefs
  attribute, which I guess is what you're looking for (?)
 
  Run example(svm) to its end and type:
 
  R  m$coefs
[,1]
[1,]  1.00884130
[2,]  1.27446460
[3,]  2.
[4,] -1.
[5,] -0.35480340
[6,] -0.74043692
[7,] 

Re: [R] SVM coefficients

2009-09-01 Thread Steve Lianoglou

Hi Marlene,

I'm going to cut out much of your post and just cut to the chase:

On Sep 1, 2009, at 9:03 AM, marlene marchena wrote:

Looking only to prediction purpose the scale model is good but I’m  
mainly
interested in w. Is it possible to improve this model to get lower  
values to
w? Actually I’m trying to run the SVM-GARCH and one condition to the  
model

is that the sum of

w’s 1 (in my model I have only two independent variables).

If you have any idea how to improve the model or if you find any  
problem

with it please let me now.



In principle you should be able to do what you're after (of  
course :-), but I'm pretty sure you won't  be able to do this using  
the e1071 package since you're imposing a linear constraint on w (this  
is almost like an l1 w/o using absolute vals of w's components, no?),  
while e1071::svm is solving a convex constraint (l2 on w).


You say you're mainly interested in w, so are you looking for a  
means of doing feature selection? You can stick with e1071 and try  
doing recursive feature elimination (google it, you'll find mucho (aka  
SVM RFE)), or you can rig up an l1-svm which is already implemented  
for you in the penalized svm package (haven't used it myself):


cran: http://cran.r-project.org/web/packages/penalizedSVM/index.html
publication: 
http://bioinformatics.oxfordjournals.org/cgi/content/full/25/13/1711

Does that help?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM coefficients

2009-08-31 Thread Noah Silverman

Steve,

That doesn't work.

I just trained an SVM with 80 variables.
svm_model$coefs gives me  a list of 10,000 items.  My training set is 
30,000 examples of 80 variables, so I have no idea what the 10,000 items 
represent.


There should be some attribute that lists the weights for each of the 
80 variables.


--
Noah

On 8/30/09 7:47 PM, Steve Lianoglou wrote:

Hi,

On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com  wrote:
   

Hello,

I'm using the svm function from the e1071 package.

It works well and gives me nice results.

I'm very curious to see the actual coefficients calculated for each input
variable.  (Other packages, like RapidMiner, show you this automatically.)

I've tried looking at attributes for the model and do see a coefficients
item, but printing it returns an NULL result.
 

Hmm .. I don't see a coefficients attribute, but rather a coefs
attribute, which I guess is what you're looking for (?)

Run example(svm) to its end and type:

R  m$coefs
  [,1]
  [1,]  1.00884130
  [2,]  1.27446460
  [3,]  2.
  [4,] -1.
  [5,] -0.35480340
  [6,] -0.74043692
  [7,] -0.87635311
  [8,] -0.04857869
  [9,] -0.03721980
[10,] -0.64696793
[11,] -0.57894605

HTH,

-steve




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM coefficients

2009-08-31 Thread Achim Zeileis

On Mon, 31 Aug 2009, Noah Silverman wrote:


Steve,

That doesn't work.

I just trained an SVM with 80 variables.
svm_model$coefs gives me  a list of 10,000 items.  My training set is 30,000 
examples of 80 variables, so I have no idea what the 10,000 items represent.


Presumably, the coefficients of the support vectors times the training 
labels, see help(svm, package = e1071). See also

  http://www.jstatsoft.org/v15/i09/
for some background information and the different formulations available.

There should be some attribute that lists the weights for each of the 80 
variables.


Not sure what you are looking for. Maybe David, the author auf svm() (and 
now Cc), can help.

Z


--
Noah

On 8/30/09 7:47 PM, Steve Lianoglou wrote:

Hi,

On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com 
wrote:



Hello,

I'm using the svm function from the e1071 package.

It works well and gives me nice results.

I'm very curious to see the actual coefficients calculated for each input
variable.  (Other packages, like RapidMiner, show you this automatically.)

I've tried looking at attributes for the model and do see a coefficients
item, but printing it returns an NULL result.


Hmm .. I don't see a coefficients attribute, but rather a coefs
attribute, which I guess is what you're looking for (?)

Run example(svm) to its end and type:

R  m$coefs
  [,1]
  [1,]  1.00884130
  [2,]  1.27446460
  [3,]  2.
  [4,] -1.
  [5,] -0.35480340
  [6,] -0.74043692
  [7,] -0.87635311
  [8,] -0.04857869
  [9,] -0.03721980
[10,] -0.64696793
[11,] -0.57894605

HTH,

-steve




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM coefficients

2009-08-31 Thread Bernd Bischl

Noah Silverman wrote:

Steve,

That doesn't work.

I just trained an SVM with 80 variables.
svm_model$coefs gives me  a list of 10,000 items.  My training set is 
30,000 examples of 80 variables, so I have no idea what the 10,000 
items represent.


There should be some attribute that lists the weights for each of 
the 80 variables.




Hi Noah,
does this help?

# make binary problem from iris
mydata - iris[1:100,]
mydata$Species - mydata$Species[,drop=T]
str(mydata)

#'data.frame':   100 obs. of  5 variables:
# $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 2 levels setosa,versicolor: 1 1 1 1 1 1 
1 1 1 1 ...


# inputs
X - as.matrix(mydata[,-5])

# train svm with linear kernel,
# to make later stuff easier we dont scale
m - svm(Species~., data=mydata, kernel=linear, scale=F)

# 
# Number of Support Vectors:  3

# we get 3 support vectors, these are weights for training cases
# or in svm therory speak: our dual variables alpha
m$coefs[,1]
# [1]  0.67122500  0.07671148 -0.74793648

# these are the indices of the cases to which the alphas belong
m$index
# [1] 24 42 99

# lets calculate the primary vars from the dual ones
# svm theory says
# w = sum x_i alpha_i
w - t(m$coefs) %*% X[m$index,]
#Sepal.Length Sepal.Width Petal.Length Petal.Width
# [1,]  -0.04602689   0.5216377-1.003002  -0.4641042

# test whether the above was nonsense.
# e1071 predict
p1 - predict(m, newdata=mydata, decision.values=T)
p1 - attr(p1, decision.values)
# do it manually with w, simple linear predictor with intercept -m$rho
p2 - X %*% t(w) - m$rho

# puuuh, lucky
max(abs(p1 - p2))
# [1] 6.439294e-15


Bernd

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM coefficients

2009-08-31 Thread Steve Lianoglou

Hi,

On Aug 31, 2009, at 3:32 AM, Noah Silverman wrote:


Steve,

That doesn't work.


Actually, it does :-)


I just trained an SVM with 80 variables.
svm_model$coefs gives me  a list of 10,000 items.  My training set  
is 30,000 examples of 80 variables, so I have no idea what the  
10,000 items represent.


There should be some attribute that lists the weights for each of  
the 80 variables.


No, not really.

The coefficients that you're pulling out are the weights for the  
support vectors. These aren't the coefficients you're expecting as in  
the normal linear model case, or whatever.


I guess you're using the RBF kernel, right? The 80 variables that  
you're using are being transformed into some higher dimensional space,  
so the 80 weights you expect to get back don't really exist in the way  
you're expecting.


SVMs are great for accuracy, but notoriously hard for interpretation.

To try and squeeze some interpretability from your classifier in your  
feature space, you might try to look at the weights over your w vector:


http://www.nabble.com/How-to-get-w-and-b-in-SVR--%28package-e1071%29-td24790413.html#a24791423

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM coefficients

2009-08-31 Thread Noah Silverman
Thanks,

I just remember with RapidMiner, there was always a screen showing the 
effective weights assigned to each input variable by the SVM.  These 
numbers themselves weren't good for much, except they really helped to 
visualize the data.  It is rather useful to see how much relative weight 
(significance.) the SVM assigned to each variable.


On 8/31/09 12:54 AM, Achim Zeileis wrote:
 On Mon, 31 Aug 2009, Noah Silverman wrote:

 Steve,

 That doesn't work.

 I just trained an SVM with 80 variables.
 svm_model$coefs gives me  a list of 10,000 items.  My training set is 
 30,000 examples of 80 variables, so I have no idea what the 10,000 
 items represent.

 Presumably, the coefficients of the support vectors times the training 
 labels, see help(svm, package = e1071). See also
   http://www.jstatsoft.org/v15/i09/
 for some background information and the different formulations available.

 There should be some attribute that lists the weights for each of 
 the 80 variables.

 Not sure what you are looking for. Maybe David, the author auf svm() 
 (and now Cc), can help.
 Z

 -- 
 Noah

 On 8/30/09 7:47 PM, Steve Lianoglou wrote:
 Hi,

 On Sun, Aug 30, 2009 at 6:10 PM, Noah 
 Silvermann...@smartmediacorp.com wrote:

 Hello,

 I'm using the svm function from the e1071 package.

 It works well and gives me nice results.

 I'm very curious to see the actual coefficients calculated for each 
 input
 variable.  (Other packages, like RapidMiner, show you this 
 automatically.)

 I've tried looking at attributes for the model and do see a 
 coefficients
 item, but printing it returns an NULL result.

 Hmm .. I don't see a coefficients attribute, but rather a coefs
 attribute, which I guess is what you're looking for (?)

 Run example(svm) to its end and type:

 R  m$coefs
   [,1]
   [1,]  1.00884130
   [2,]  1.27446460
   [3,]  2.
   [4,] -1.
   [5,] -0.35480340
   [6,] -0.74043692
   [7,] -0.87635311
   [8,] -0.04857869
   [9,] -0.03721980
 [10,] -0.64696793
 [11,] -0.57894605

 HTH,

 -steve



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM coefficients

2009-08-30 Thread Steve Lianoglou
Hi,

On Sun, Aug 30, 2009 at 6:10 PM, Noah Silvermann...@smartmediacorp.com wrote:
 Hello,

 I'm using the svm function from the e1071 package.

 It works well and gives me nice results.

 I'm very curious to see the actual coefficients calculated for each input
 variable.  (Other packages, like RapidMiner, show you this automatically.)

 I've tried looking at attributes for the model and do see a coefficients
 item, but printing it returns an NULL result.

Hmm .. I don't see a coefficients attribute, but rather a coefs
attribute, which I guess is what you're looking for (?)

Run example(svm) to its end and type:

R m$coefs
 [,1]
 [1,]  1.00884130
 [2,]  1.27446460
 [3,]  2.
 [4,] -1.
 [5,] -0.35480340
 [6,] -0.74043692
 [7,] -0.87635311
 [8,] -0.04857869
 [9,] -0.03721980
[10,] -0.64696793
[11,] -0.57894605

HTH,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.