followup: Re: [R] Issue with predict() for glm models

2004-09-23 Thread Paul Johnson
I have a follow up question that fits with this thread.
Can you force an overlaid plot showing predicted values to follow the 
scaling of the axes of the plot over which it is laid?

Here is an example based on linear regression, just for clarity.  I have 
followed the procedure described below to create predictions and now 
want to plot the predicted values on top of a small section of the x-y 
scatterplot.

x - rnorm(100, 10, 10)
e - rnorm(100, 0, 5)
y - 5 + 10 *x + e
myReg1 - lm (y~x)
plot(x,y)
newX - seq(1,10,1)
myPred - predict(myReg1,data.frame(x=newX))
Now, if I do this, I get 2 graphs overlaid but their axes do not line 
up.

par(new=T)
plot(newX,myPred$fit)
The problem is that the second one uses the whole width of the graph 
space, when I'd rather just have it go from the small subset where its x 
is defined, from 1 to 10.  Its stretching the range (1,10) for newX to 
use the same scale that goes from (-15, 35) where it plots x

I know abline() can do this for lm, but for some other kinds of models, 
no  lines() method is provided, and so I am doing this the old fashioned 
way.

pj
John Fox wrote:
Dear Uwe, 


-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 8:06 AM
To: John Fox
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

John Fox wrote:

Dear Uwe,
Unless I've somehow messed this up, as I mentioned 
yesterday, what you 

suggest doesn't seem to work when the predictor is a 
matrix. Here's a 

simplified example:

X - matrix(rnorm(200), 100, 2)
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = 
matrix(rnorm(20),2)) predict(mod, new)
Dear John,
the questioner had a 2 column matrix with 40 and one with 50 
observations (not a 100 column matrix with 2 observation) and 
for those matrices it works ...


Indeed, and in my example the matrix predictor X has 2 columns and 100 rows;
I did screw up the matrix for the new data to be used for predictions (in
the example I sent today but not yesterday), but even when this is done
right -- where the new data has 10 rows and 2 columns -- there are 100 (not
10) predicted values:

X - matrix(rnorm(200), 100, 2)  # original predictor matrix with 100 rows
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat)
new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows
predict(mod, new) # note 100 predicted values
   12345
6 
  5.75238091   0.31874587  -3.00515893  -3.77282121  -1.97511221
0.54712914 
   789   10   11
12 
  1.85091226   4.38465524  -0.41028694  -1.53942869   0.57613555
-1.82761518 

 . . .
  91   92   93   94   95
96 
  0.36210780   1.71358713  -9.63612775  -4.54257576  -5.29740468
2.64363405 
  97   98   99  100 
 -4.45478627  -2.44973209   2.51587537  -4.09584837 

Actually, I now see the source of the problem:
The data frames dat and new don't contain a matrix named X; rather the
matrix is split columnwise:

names(dat)
[1] y   X.1 X.2
names(new)
[1] X.1 X.2
Consequently, both glm and predict pick up the X in the global environment
(since there is none in dat or new), which accounts for why there are 100
predicted values.
Using list() rather than data.frame() produces the originally expected
behaviour:

new - list(X = matrix(rnorm(20),10, 2))
predict(mod, new)
 1  2  3  4  5  6  7
 5.9373064  0.3687360 -8.3793045  0.7645584 -2.6773842  2.4130547  0.7387318
 8  9 10 
-0.4347916  8.4678728 -0.8976054 

Regards,
 John

Best,
Uwe




  12345
6 
 1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
-2.50635812 
  789   10   11
12 
 5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
1.80400271 
 13   14   15   16   17
18 
 9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
4.90038016

. . .
  97   98   99  100 
-6.92509812   0.59357486  -1.17205723   0.04209578 

Note that there are 100 rather than 10 predicted values.
But with individuals predictors (rather than a matrix),

x1 - X[,1]
x2 - X[,2]
dat.2 - data.frame(y=y, x1=x1, x2=x2)
mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
new.2 - data.frame(x1=rnorm(10), x2=rnorm(10)) 
predict(mod.2, new.2)
1  2  3  4  5  
   6  7
6.5723823  0.6356392  4.0291018 -4.7914650  2.1435485 -3.1738096 
-2.8261585

8  9 10 
-1.5255329 -4.7087592  4.0619290

works as expected (?).
Regards,
John


-Original Message-
From: [EMAIL 

RE: followup: Re: [R] Issue with predict() for glm models

2004-09-23 Thread Austin, Matt
Could you just use

lines(newX, myPred, col=2)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Paul Johnson
Sent: Thursday, September 23, 2004 10:3 AM
To: r help
Subject: followup: Re: [R] Issue with predict() for glm models


I have a follow up question that fits with this thread.

Can you force an overlaid plot showing predicted values to follow the 
scaling of the axes of the plot over which it is laid?

Here is an example based on linear regression, just for clarity.  I have 
followed the procedure described below to create predictions and now 
want to plot the predicted values on top of a small section of the x-y 
scatterplot.

x - rnorm(100, 10, 10)
e - rnorm(100, 0, 5)
y - 5 + 10 *x + e

myReg1 - lm (y~x)
plot(x,y)
newX - seq(1,10,1)
myPred - predict(myReg1,data.frame(x=newX))

Now, if I do this, I get 2 graphs overlaid but their axes do not line 
up.

par(new=T)
plot(newX,myPred$fit)

The problem is that the second one uses the whole width of the graph 
space, when I'd rather just have it go from the small subset where its x 
is defined, from 1 to 10.  Its stretching the range (1,10) for newX to 
use the same scale that goes from (-15, 35) where it plots x

I know abline() can do this for lm, but for some other kinds of models, 
no  lines() method is provided, and so I am doing this the old fashioned 
way.

pj

John Fox wrote:
 Dear Uwe, 
 
 
-Original Message-
From: Uwe Ligges [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 23, 2004 8:06 AM
To: John Fox
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [R] Issue with predict() for glm models

John Fox wrote:


Dear Uwe,

Unless I've somehow messed this up, as I mentioned 

yesterday, what you 

suggest doesn't seem to work when the predictor is a 

matrix. Here's a 

simplified example:



X - matrix(rnorm(200), 100, 2)
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat) new - data.frame(X = 
matrix(rnorm(20),2)) predict(mod, new)

Dear John,

the questioner had a 2 column matrix with 40 and one with 50 
observations (not a 100 column matrix with 2 observation) and 
for those matrices it works ...

 
 
 Indeed, and in my example the matrix predictor X has 2 columns and 100
rows;
 I did screw up the matrix for the new data to be used for predictions
(in
 the example I sent today but not yesterday), but even when this is done
 right -- where the new data has 10 rows and 2 columns -- there are 100
(not
 10) predicted values:
 
 
X - matrix(rnorm(200), 100, 2)  # original predictor matrix with 100 rows
y - (X %*% c(1,2) + rnorm(100))  0
dat - data.frame(y=y, X=X)
mod - glm(y ~ X, family=binomial, data=dat)
new - data.frame(X = matrix(rnorm(20),10, 2)) # corrected -- note 10 rows
predict(mod, new) # note 100 predicted values
 
12345
 6 
   5.75238091   0.31874587  -3.00515893  -3.77282121  -1.97511221
 0.54712914 
789   10   11
 12 
   1.85091226   4.38465524  -0.41028694  -1.53942869   0.57613555
 -1.82761518 
 
  . . .
 
   91   92   93   94   95
 96 
   0.36210780   1.71358713  -9.63612775  -4.54257576  -5.29740468
 2.64363405 
   97   98   99  100 
  -4.45478627  -2.44973209   2.51587537  -4.09584837 
 
 Actually, I now see the source of the problem:
 
 The data frames dat and new don't contain a matrix named X; rather the
 matrix is split columnwise:
 
 
names(dat)
 
 [1] y   X.1 X.2
 
names(new)
 
 [1] X.1 X.2
 
 Consequently, both glm and predict pick up the X in the global environment
 (since there is none in dat or new), which accounts for why there are 100
 predicted values.
 
 Using list() rather than data.frame() produces the originally expected
 behaviour:
 
 
new - list(X = matrix(rnorm(20),10, 2))
predict(mod, new)
 
  1  2  3  4  5  6
7
 
  5.9373064  0.3687360 -8.3793045  0.7645584 -2.6773842  2.4130547
0.7387318
 
  8  9 10 
 -0.4347916  8.4678728 -0.8976054 
 
 Regards,
  John
 
 
Best,
Uwe








   12345
6 
  1.81224443  -5.92955128   1.98718051 -10.05331521   2.6506
-2.50635812 
   789   10   11
12 
  5.63728698  -0.94845276  -3.61657377  -1.63141320   5.03417372
1.80400271 
  13   14   15   16   17
18 
  9.32876273  -5.32723406   5.29373023  -3.90822713 -10.95065186
4.90038016

 . . .

   97   98   99  100 
 -6.92509812   0.59357486  -1.17205723   0.04209578 


Note that there are 100 rather than 10 predicted values.

But with individuals predictors (rather than a matrix),



x1 - X[,1]
x2 - X[,2]
dat.2 - data.frame(y=y, x1=x1, x2=x2)
mod.2 - glm(y ~ x1 + x2, family=binomial, data=dat.2)
new

Re: followup: Re: [R] Issue with predict() for glm models

2004-09-23 Thread Marc Schwartz
On Thu, 2004-09-23 at 12:02, Paul Johnson wrote:
 I have a follow up question that fits with this thread.
 
 Can you force an overlaid plot showing predicted values to follow the 
 scaling of the axes of the plot over which it is laid?
 
 Here is an example based on linear regression, just for clarity.  I have 
 followed the procedure described below to create predictions and now 
 want to plot the predicted values on top of a small section of the x-y 
 scatterplot.
 
 x - rnorm(100, 10, 10)
 e - rnorm(100, 0, 5)
 y - 5 + 10 *x + e
 
 myReg1 - lm (y~x)
 plot(x,y)
 newX - seq(1,10,1)
 myPred - predict(myReg1,data.frame(x=newX))
 
 Now, if I do this, I get 2 graphs overlaid but their axes do not line 
 up.
 
 par(new=T)
 plot(newX,myPred$fit)
 
 The problem is that the second one uses the whole width of the graph 
 space, when I'd rather just have it go from the small subset where its x 
 is defined, from 1 to 10.  Its stretching the range (1,10) for newX to 
 use the same scale that goes from (-15, 35) where it plots x
 
 I know abline() can do this for lm, but for some other kinds of models, 
 no  lines() method is provided, and so I am doing this the old fashioned 
 way.

Paul,

Instead of using plot() for the second set of points, use points():

x - rnorm(100, 10, 10)
e - rnorm(100, 0, 5)
y - 5 + 10 * x + e

myReg1 - lm (y ~ x)
plot(x, y)

newX - seq(1, 10, 1)
myPred - predict(myReg1, data.frame(x = newX))

points(newX, myPred$fit, pch = 19)


This will preserve the axis scaling. If you use plot() without
explicitly indicating xlim and ylim, it will automatically scale the
axes based upon your new data, even if you indicated that the underlying
plot should not be cleared.

Alternatively, you could also use the lines() function, which will draw
point to point lines:

lines(newX, myPred$fit, col = red)

If you want fitted lines and prediction/confidence intervals, you could
use a function like matlines(), presuming that a predict method exists
for the model type you want to use.

There is an example of using this in R Help Desk in R News Vol 3
Number 2 (October 2003), in the first example, with a standard linear
regression model.

HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html