[R] Save Cluster results to data frame

2009-05-18 Thread Chris Arthur
If I cluster my data into 3 sets, using pam for instance, is there a way 
to save the resultant cluster results, to the originating data frame. 
and related to that how do i say change the cluster names to something a 
bit more meaningful that 1..2...3


So it goes like this.

Data  --- Cluster into 3 groups  given them meaningful names

---output back to data frame


Thanks for the help

Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split data frame based on Class

2009-05-18 Thread Chris Arthur
Each row of my data frame is assigned to a class (eg country). Can you 
suggest how I break apart the data frame so that I create new data 
frames for each class


eg

If Class = US put in new dataframe dataUS

Thanks in advance for your help

Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using sample to create Training and Test sets

2009-05-15 Thread Chris Arthur
Forgive the newbie question, I want to select random rows from my 
data.frame to create a test set (which I can do) but then I want to 
create a training set using whats left over.


Example code:
acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1)
#select 400 random rows in data
training - acc[sample(1:nrow(acc), 400, replace=TRUE),]

#try to get whats left of acc not in training
testset - acc[-training, ]
Fails with the following error
Error: invalid subscript type
In addition: Warning message:
- not meaningful for factors in: Ops.factor(left)

I then try.
testset - acc[!training, ]
Which gives me the warning message
! not meaningful for factors in: Ops.factor(left)
And if i look at testset It is 400 rows of NA's ... which clearly isn't 
right.


Can anyone tell me what I'm doing wrong.

Thanks in advance

Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data frame titles

2008-06-04 Thread Chris Arthur
I'm new to R so please forgive the newbie question; but i can't seem to 
find a definitive answer to this.


I am wanting to do a PLS regression on some data. Which takes a formula 
of the type

responseACC ~ dataACC

where dataACC is multivariate in nature and responseACC is a single value.

I have imported my data from a csv file into a dataframe called for 
arguments sake df.

This file has no headers in it so I get column names X1.X83

I then create a new data frame which says
data.frame(dataACC = df[1:82], responseACC = df[83])

When I look at the names of this though I get...

dataACC.X1 . dataACC.X82 and then X83

When I pass this to the PLS algorithm I get the error

variable responseACC not found

So a few questions, Am I doing this in the right way in general? why is 
the responseACC not being associated as the name of df[83]?


Thirdly, If I want to generate my own array of values in order to do get 
a predicted response based on my initial PLS regression, will it matter 
that the array data will not have header values (eg dataACC.X1 etc)? if 
so what the best way of appending the header data?


Thanks for any help you can lend

Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.