[R] Using sample to create Training and Test sets

2009-05-15 Thread Chris Arthur
Forgive the newbie question, I want to select random rows from my 
data.frame to create a test set (which I can do) but then I want to 
create a training set using whats left over.


Example code:
acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1)
#select 400 random rows in data
training - acc[sample(1:nrow(acc), 400, replace=TRUE),]

#try to get whats left of acc not in training
testset - acc[-training, ]
Fails with the following error
Error: invalid subscript type
In addition: Warning message:
- not meaningful for factors in: Ops.factor(left)

I then try.
testset - acc[!training, ]
Which gives me the warning message
! not meaningful for factors in: Ops.factor(left)
And if i look at testset It is 400 rows of NA's ... which clearly isn't 
right.


Can anyone tell me what I'm doing wrong.

Thanks in advance

Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using sample to create Training and Test sets

2009-05-15 Thread Frank E Harrell Jr
Note that the single split sample technique is not competitive with 
other approaches unless the sample size exceeds around 20,000.


Frank


Chris Arthur wrote:
Forgive the newbie question, I want to select random rows from my 
data.frame to create a test set (which I can do) but then I want to 
create a training set using whats left over.


Example code:
acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1)
#select 400 random rows in data
training - acc[sample(1:nrow(acc), 400, replace=TRUE),]

#try to get whats left of acc not in training
testset - acc[-training, ]
Fails with the following error
Error: invalid subscript type
In addition: Warning message:
- not meaningful for factors in: Ops.factor(left)

I then try.
testset - acc[!training, ]
Which gives me the warning message
! not meaningful for factors in: Ops.factor(left)
And if i look at testset It is 400 rows of NA's ... which clearly isn't 
right.


Can anyone tell me what I'm doing wrong.

Thanks in advance

Chris



--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using sample to create Training and Test sets

2009-05-15 Thread Liaw, Andy
Here's one possibility:

idx - sample(nrow(acc))
training - acc[idx[1:400], ]
testset - acc[-idx[1:400], ]

Andy

From: Chris Arthur
 
 Forgive the newbie question, I want to select random rows from my 
 data.frame to create a test set (which I can do) but then I want to 
 create a training set using whats left over.
 
 Example code:
 acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1)
 #select 400 random rows in data
 training - acc[sample(1:nrow(acc), 400, replace=TRUE),]
 
 #try to get whats left of acc not in training
 testset - acc[-training, ]
 Fails with the following error
 Error: invalid subscript type
 In addition: Warning message:
 - not meaningful for factors in: Ops.factor(left)
 
 I then try.
 testset - acc[!training, ]
 Which gives me the warning message
 ! not meaningful for factors in: Ops.factor(left)
 And if i look at testset It is 400 rows of NA's ... which 
 clearly isn't 
 right.
 
 Can anyone tell me what I'm doing wrong.
 
 Thanks in advance
 
 Chris
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using sample to create Training and Test sets

2009-05-15 Thread Max Kuhn
 Forgive the newbie question, I want to select random rows from my
 data.frame to create a test set (which I can do) but then I want to
 create a training set using whats left over.


The caret package has a function, createDataPartition, that does the
split taking into account the distribution of the outcome. This might
be good in classification cases where one or more classes have low
percentages in the data set.

There is more detail in the pdf:

 http://cran.r-project.org/web/packages/caret/vignettes/caretMisc.pdf

and examples in this pdf

  http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.