[R] Save Cluster results to data frame
If I cluster my data into 3 sets, using pam for instance, is there a way to save the resultant cluster results, to the originating data frame. and related to that how do i say change the cluster names to something a bit more meaningful that 1..2...3 So it goes like this. Data --- Cluster into 3 groups given them meaningful names ---output back to data frame Thanks for the help Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split data frame based on Class
Each row of my data frame is assigned to a class (eg country). Can you suggest how I break apart the data frame so that I create new data frames for each class eg If Class = US put in new dataframe dataUS Thanks in advance for your help Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using sample to create Training and Test sets
Forgive the newbie question, I want to select random rows from my data.frame to create a test set (which I can do) but then I want to create a training set using whats left over. Example code: acc - read.table(accOUT.txt, header=T, sep = ,, row.names=1) #select 400 random rows in data training - acc[sample(1:nrow(acc), 400, replace=TRUE),] #try to get whats left of acc not in training testset - acc[-training, ] Fails with the following error Error: invalid subscript type In addition: Warning message: - not meaningful for factors in: Ops.factor(left) I then try. testset - acc[!training, ] Which gives me the warning message ! not meaningful for factors in: Ops.factor(left) And if i look at testset It is 400 rows of NA's ... which clearly isn't right. Can anyone tell me what I'm doing wrong. Thanks in advance Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame titles
I'm new to R so please forgive the newbie question; but i can't seem to find a definitive answer to this. I am wanting to do a PLS regression on some data. Which takes a formula of the type responseACC ~ dataACC where dataACC is multivariate in nature and responseACC is a single value. I have imported my data from a csv file into a dataframe called for arguments sake df. This file has no headers in it so I get column names X1.X83 I then create a new data frame which says data.frame(dataACC = df[1:82], responseACC = df[83]) When I look at the names of this though I get... dataACC.X1 . dataACC.X82 and then X83 When I pass this to the PLS algorithm I get the error variable responseACC not found So a few questions, Am I doing this in the right way in general? why is the responseACC not being associated as the name of df[83]? Thirdly, If I want to generate my own array of values in order to do get a predicted response based on my initial PLS regression, will it matter that the array data will not have header values (eg dataACC.X1 etc)? if so what the best way of appending the header data? Thanks for any help you can lend Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.