[R] bug in rpart?

2009-05-22 Thread Yuanyuan
Greetings,

I checked the Indian diabetes data again and get one tree for the data with
reordered columns and another tree for the original data. I compared these
two trees, the split points for these two trees are exactly the same but the
fitted classes are not the same for some cases. And the misclassification
errors are different too. I know how CART deal with ties --- even we are
using the same data, the subjects to the left and right would not be the
same if we just rearrange the order of covariates.

But the problem is, the fitted trees are exactly the same on the split
points. Shouldn't we get the same fitted values if the decisions are the
same at each step? Why the same structured trees have different observations
on the nodes?

The source code for running the diabetes data example and the output of
trees are attached. Your professional opinion is very much appreciated.

library(mlbench)
data(PimaIndiansDiabetes2)
mydata-PimaIndiansDiabetes2
library(rpart)
fit2-rpart(diabetes~., data=mydata,method=class)
plot(fit2,uniform=T,main=CART for original data)
text(fit2,use.n=T,cex=0.6)
printcp(fit2)
table(predict(fit2,type=class),mydata$diabetes)
## misclassifcation table: rows are fitted class
  neg pos
  neg 437  68
  pos  63 200


pmydata-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)])
fit3-rpart(diabetes~., data=pmydata,method=class)
plot(fit3,uniform=T,main=CART after exchaging mass  glucose)
text(fit3,use.n=T,cex=0.6)
printcp(fit3)
table(predict(fit3,type=class),pmydata$diabetes)
##after exchage the order of BODY mass and PLASMA glucose
  neg pos
  neg 436  64
  pos  64 204


Best,

-- 
--
Yuanyuan Huang
Email: sunnyua...@gmail.com


ReorderedTree.pdf
Description: Adobe PDF document


OriginalTree.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bug in rpart?

2009-05-22 Thread Uwe Ligges



Yuanyuan wrote:

Greetings,

I checked the Indian diabetes data again and get one tree for the data with
reordered columns and another tree for the original data. I compared these
two trees, the split points for these two trees are exactly the same but the
fitted classes are not the same for some cases. And the misclassification
errors are different too. I know how CART deal with ties --- even we are
using the same data, the subjects to the left and right would not be the
same if we just rearrange the order of covariates.

But the problem is, the fitted trees are exactly the same on the split
points. Shouldn't we get the same fitted values if the decisions are the
same at each step? Why the same structured trees have different observations
on the nodes?



Because they may use different surrogate variables. Note that your data 
contain missing values that are handled by surrogates.


Best,
Uwe Ligges






The source code for running the diabetes data example and the output of
trees are attached. Your professional opinion is very much appreciated.

library(mlbench)
data(PimaIndiansDiabetes2)
mydata-PimaIndiansDiabetes2
library(rpart)
fit2-rpart(diabetes~., data=mydata,method=class)
plot(fit2,uniform=T,main=CART for original data)
text(fit2,use.n=T,cex=0.6)
printcp(fit2)
table(predict(fit2,type=class),mydata$diabetes)
## misclassifcation table: rows are fitted class
  neg pos
  neg 437  68
  pos  63 200


pmydata-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)])
fit3-rpart(diabetes~., data=pmydata,method=class)
plot(fit3,uniform=T,main=CART after exchaging mass  glucose)
text(fit3,use.n=T,cex=0.6)
printcp(fit3)
table(predict(fit3,type=class),pmydata$diabetes)
##after exchage the order of BODY mass and PLASMA glucose
  neg pos
  neg 436  64
  pos  64 204


Best,





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.