[R] Logistic Models, Cross Validation, error tables , Monte Carlo Simulation?

Tom Willems Wed, 08 Aug 2007 09:30:15 -0700

Thanks Mr. Ellison,

Your remark helped solve my error table problem.


However, I found a new one.

Now that I have my error tables, i realised that it is no good statistical 
practise to validate a model, based one one error table.

So i should use a tool like K-fold CV.


ex:     binary_model <- glm (y_binary~ x_value, family = 
binomial,data=dataset)

        cv.binary(binary_model,rand=NULL, nfolds=1000, print.details=TRUE)


This is no problem for the binary model, for the odds model this is not 
the case.

Do you know a tool that can do this, or perhapes a way to implement it in 
a monte carlo simulation?

(i added the way i solved the error table problem below)


Kind regards,

Tom.



>ERROR TABLE DILEMMA

>For a binary model there is no problem (here y has an outcome of 0 or 1)

>

>ex:    pred_binary_model=(expit(predict(binary_model,tsample))>P)

>       table_binary_model=table(pred_binary_model,tsample[,2])

> 
TER_binary_model=sum(diag(table_binary_model[,]))/sum(table_binary_model)

>

>       (error table1)

>       pred_binary_model       0   1

>                       FALSE   28 95

>                       TRUE    4 114

>       [1] 0.5892116 --> of correct classified cases

>

>Here there are 28 + 114 correctly predicted test cases, this results in 
58.9% correct classified cases.

>A few more misclassified cases does not result in big departures from 
this 58.9%.

>

>When i preform this on categorical data, i have to use frequency tables.

>This predicts the number of successes and the number of failures, per 
interval.(odds per interval)

>So the error table does contain an outcome of odds for every given 
interval.

>

>ex: (error table2)

>       oddsPt

>       pred_percent_model      0.00 0.16 0.37 0.84

>                       0.05    1       0   0   0

>                       0.16    0       1   0   0

>                       0.34    0       0   1   0
 
>                       0.78    0       0   0   1

>       [1] 1 --> of correct classified cases

>

>As you can see, one misclassification will take disastrous proportions. 
(~25% difference)

>The output of error table2 is interpretable, but it is not ideal, and 
oversensitive to misclassification.

>

>I was able to solve this later problem by extracting the model 
coefficients, and then using them in a function.

>Based on this function, i was able to write an error table equal to table 
1.

 


Disclaimer: click here
        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logistic Models, Cross Validation, error tables , Monte Carlo Simulation?

Reply via email to