Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Dwaipayan Dasgupta
, 2012 7:17 PM To: r-help@r-project.orgmailto:r-help@r-project.org Subject: Re: [R] Splitting data into test and train (80:20) kepping attributes similar Well, it throws an error, because there is no such function in default R. A bit of googling showed it might be the one in the caTools package

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Frank Harrell
You can run simulations to find out how large N must be so that split sample validation yields sufficient precision to be trustworthy, in other words, that different random splits provide the same estimate of model accuracy to within some small tolerance. You will be surprised how large N must be

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Jessica Streicher
Dasgupta Cc: r-help@r-project.org Subject: Re: [R] Splitting data into test and train (80:20) kepping attributes similar Don't know whats wrong there (except if you're using the eclipse R plugin on a mac like me and the window for choosing the download site doesn't pop up.. did

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Dwaipayan Dasgupta
] On Behalf Of Jessica Streicher Sent: Thursday, April 26, 2012 5:07 PM To: r-help@r-project.org Subject: Re: [R] Splitting data into test and train (80:20) kepping attributes similar Be reminded that s1 and s2 are only the indexes on AD_0 and AD_1 of the data which you want to keep. therefore traindata

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-26 Thread Jessica Streicher
Ad_1 - subset(Attrition_data_1,Attrition_ind==1) Ad_0 - subset(Attrition_data_1,Attrition_ind==0) s1-sample(1:dim(Ad_0)[1],0.8*dim(Ad_0)[1])# 80% of the non-attrites s2-sample(1:dim(Ad_1)[1],0.8*dim(Ad_1)[1])# 80% of attritees s3- Ad_0 [-s1,] summary(s3) s4- Ad_1 [-s2,] summary(s4) s5-

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Dwaipayan Dasgupta
Thanks in advance doy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dwaipayan Dasgupta Sent: Tuesday, April 24, 2012 9:08 PM To: r-help@r-project.org Subject: [R] Splitting data into test and train (80:20) kepping attributes

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Jessica Streicher
in advance doy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dwaipayan Dasgupta Sent: Tuesday, April 24, 2012 9:08 PM To: r-help@r-project.org Subject: [R] Splitting data into test and train (80:20) kepping attributes

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Dwaipayan Dasgupta
Subject: Re: [R] Splitting data into test and train (80:20) kepping attributes similar Well, it throws an error, because there is no such function in default R. A bit of googling showed it might be the one in the caTools package. execute this: install.packages(caTools) library(caTools) before

Re: [R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-25 Thread Jessica Streicher
into test and train (80:20) kepping attributes similar Hi, I am trying to do some predictive modeling around attrition and want to split the dataset into test and train (80:20) and keep the ratio of attritees:non attrites same. In my dataset the attrition indicator is coded as 0(for non

[R] Splitting data into test and train (80:20) kepping attributes similar

2012-04-24 Thread Dwaipayan Dasgupta
Hi, I am trying to do some predictive modeling around attrition and want to split the dataset into test and train (80:20) and keep the ratio of attritees:non attrites same. In my dataset the attrition indicator is coded as 0(for non-attritees) and 1 (for attritees) and I want to keep the ratio