[R] subset data based on values in multiple columns
Dear list members, I am trying to create a subset of a data frame based on conditions in two columns, and after spending much time trying (and search R-help) have not had any luck. Essentially, I have a data frame that is something like this: date-as.POSIXct(as.character(c(2012-01-25,2012-01-25,2012-01-26,2012-01-27,2012-01-27,2012-01-27))) time-as.POSIXct(as.character(c(13:20, 13:40, 14:00, 10:00, 10:20, 10:20)), format=%H:%M) count-c(12,14,11,12,12,8) data-data.frame(date,time,count) which looks like: date time count 1 2012-01-2513:20:00 12 2 2012-01-2513:40:00 14 3 2012-01-2614:00:00 11 4 2012-01-2710:00:00 12 5 2012-01-2710:20:00 12 6 2012-01-2710:20:00 8 I would like to create a subset by doing the following: for each unique date, only include one case which will be the case with the max value for the column labelled count. So the resulting subset would be: date time count 2 2012-01-2513:40:00 14 3 2012-01-2614:00:00 11 4 2012-01-2710:00:00 12 Some dates have two cases at which the count was the same, but I only want to include one case (I don't really mind which case it chooses, but if need be it could be based on the earliest time for which the same counts occurred). I have tried various loops with no success! I'm sure that there is an easy answer that I have not found! Any help is much appreciated!! All the best, Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loops to assign a unique ID to a column
Dear R help, I am fairly new in data management and programming in R, and am trying to write what is probably a simple loop, but am not having any luck. I have a dataframe with something like the following (but much bigger): Dates-c(12/10/2010,12/10/2010,12/10/2010,13/10/2010, 13/10/2010, 13/10/2010) Groups-c(A,B,B,A,B,C) data-data.frame(Dates, Groups) I would like to create a new column in the dataframe, and give each distinct date by group a unique identifying number starting with 1, so that the resulting column would look something like: ID-c(1,2,2,3,4,5) The loop that I have started to write is something like this (but doesn't work!): data$ID-as.number(c()) for(i in unique(data$Dates)){ for(j in unique(data$Groups)){ data$ID[i,j]-i i-i+1 } } Am I on the right track? Any help on this is much appreciated! Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stacked bar plot
Many thanks!! That's a million times easier!! :-) All the best, Chandra From: istaz...@gmail.com on behalf of Ista Zahn Sent: Wed 3/23/2011 12:06 PM To: Chandra Salgado Kent Cc: r-help@r-project.org Subject: Re: [R] stacked bar plot FWIW, the ggplot option I suggested works fine with sums instead of means... library(ggplot2) .Table-data.frame(Sex=c(M,F,M,F,F), Number=c(10,3,1,2,3), Group_size=c(1,1,2,2,2)) ggplot(.Table, aes(Group_size, Number, fill=Sex)) + geom_bar(stat=summary, fun.y=sum) Best, Ista On Wed, Mar 23, 2011 at 3:21 AM, Chandra Salgado Kent c.salg...@cmst.curtin.edu.au wrote: Hello, Many thanks for your responses! They were very helpful. FYI, ggplot didn't work for me because I needed the sum of the values. The fudged option of barplot was very helpful. Since my matrix is extremely large (the example is a subset), and I would need to take a lot of time to insert NAs everywhere as you did, I used the main idea you sent but instead did summed over group sizes. I'm sure this is far from the most efficient way of doing this, but it was the only way I found for my very large matrix. Thanks again!! Here is my solution: #- .Table-data.frame(Sex=c(M,F,M,F,F), Number=c(10,3,1,2,3), Group_size=c(1,1,2,2,2)) #I separated the females first, and ordered them by group size Females-subset(.Table, Sex==F) .Order-order(Females$Group_size) FemalesF-rbind(Females$Group_size, Females$Number)[,.Order] FemalesF-t(FemalesF) #I then deleted any NAs which I had in my database, then summed Number for each Group_size and converted it to a matrix Females1 - FemalesF[complete.cases(FemalesF[,2]),] Females2-by(FemalesF,FemalesF[,1], FUN = function(x){ sum(x[,2]) }) Females3-matrix(Females2) #I then did the same for the males Males-subset(.Table, Sex==M) .Order-order(Males$Group_size) MalesF-rbind(Males$Group_size, Males$Number)[,.Order] MalesF-t(MalesF) Males1 - MalesF[complete.cases(MalesF[,2]),] Males2-by(MalesF,MalesF[,1], FUN = function(x){ sum(x[,2]) }) Males3-matrix((Males2)) #I then followed your example in forming a matrix of males and females suitable for barplot and plotted the data .Matrix-matrix(c(Females3,Males3),ncol=2) .Matrix-t(.Matrix) barplot(.Matrix,col=c(pink,lightblue), names.arg=c(1:3),xlab=Group size,ylab=Number,main=Group Sex) legend(10,60,c(Male,Female),fill=c(lightblue,pink)) # Chandra From: Jim Lemon [mailto:j...@bitwrit.com.au] Sent: Tue 3/22/2011 5:55 PM To: Chandra Salgado Kent Cc: r-help@r-project.org Subject: Re: [R] stacked bar plot On 03/22/2011 06:30 PM, Chandra Salgado Kent wrote: Hello, I'm wondering if someone may be able to help me, and do apologize if there is a simple and obvious solution for this. I am somewhat new to R, and have been searching for a simple solution for a couple of days. I am interested in finding a tool that allows me to plot a stacked bar plot. My data set is in the following format: data-data.frame(Sex=c(M,F,M,F,F), Number=c(10,3,1,2,3), Group_size=c(1,1,2,2,2)) I would like to have the factor Sex stacked, Group size as a Factor on the X axis, and Number on the Y axis (summed so that there is only one value for each Sex by Group_size combination). Hi Chandra, It's a bit hard to work out exactly what you want, but try this: barplot(matrix(c(10,3,NA,1,2,3),ncol=2),col=c(lightblue,pink,pink), names.arg=1:2,xlab=Group size,ylab=Number,main=Group Sex) legend(1.6,8,c(Male,Female),fill=c(lightblue,pink)) now I have fudged a bit by just making the matrix contain the values in the right order, but if the barplot is what you want, it could get you started. Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org http://yourpsyche.org/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stacked bar plot
Hello, I'm wondering if someone may be able to help me, and do apologize if there is a simple and obvious solution for this. I am somewhat new to R, and have been searching for a simple solution for a couple of days. I am interested in finding a tool that allows me to plot a stacked bar plot. My data set is in the following format: data-data.frame(Sex=c(M,F,M,F,F), Number=c(10,3,1,2,3), Group_size=c(1,1,2,2,2)) I would like to have the factor Sex stacked, Group size as a Factor on the X axis, and Number on the Y axis (summed so that there is only one value for each Sex by Group_size combination). Many, many thanks for any help you may be able to offer! Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stacked bar plot
Hello, Many thanks for your responses! They were very helpful. FYI, ggplot didn't work for me because I needed the sum of the values. The fudged option of barplot was very helpful. Since my matrix is extremely large (the example is a subset), and I would need to take a lot of time to insert NAs everywhere as you did, I used the main idea you sent but instead did summed over group sizes. I'm sure this is far from the most efficient way of doing this, but it was the only way I found for my very large matrix. Thanks again!! Here is my solution: #- .Table-data.frame(Sex=c(M,F,M,F,F), Number=c(10,3,1,2,3), Group_size=c(1,1,2,2,2)) #I separated the females first, and ordered them by group size Females-subset(.Table, Sex==F) .Order-order(Females$Group_size) FemalesF-rbind(Females$Group_size, Females$Number)[,.Order] FemalesF-t(FemalesF) #I then deleted any NAs which I had in my database, then summed Number for each Group_size and converted it to a matrix Females1 - FemalesF[complete.cases(FemalesF[,2]),] Females2-by(FemalesF,FemalesF[,1], FUN = function(x){ sum(x[,2]) }) Females3-matrix(Females2) #I then did the same for the males Males-subset(.Table, Sex==M) .Order-order(Males$Group_size) MalesF-rbind(Males$Group_size, Males$Number)[,.Order] MalesF-t(MalesF) Males1 - MalesF[complete.cases(MalesF[,2]),] Males2-by(MalesF,MalesF[,1], FUN = function(x){ sum(x[,2]) }) Males3-matrix((Males2)) #I then followed your example in forming a matrix of males and females suitable for barplot and plotted the data .Matrix-matrix(c(Females3,Males3),ncol=2) .Matrix-t(.Matrix) barplot(.Matrix,col=c(pink,lightblue), names.arg=c(1:3),xlab=Group size,ylab=Number,main=Group Sex) legend(10,60,c(Male,Female),fill=c(lightblue,pink)) # Chandra From: Jim Lemon [mailto:j...@bitwrit.com.au] Sent: Tue 3/22/2011 5:55 PM To: Chandra Salgado Kent Cc: r-help@r-project.org Subject: Re: [R] stacked bar plot On 03/22/2011 06:30 PM, Chandra Salgado Kent wrote: Hello, I'm wondering if someone may be able to help me, and do apologize if there is a simple and obvious solution for this. I am somewhat new to R, and have been searching for a simple solution for a couple of days. I am interested in finding a tool that allows me to plot a stacked bar plot. My data set is in the following format: data-data.frame(Sex=c(M,F,M,F,F), Number=c(10,3,1,2,3), Group_size=c(1,1,2,2,2)) I would like to have the factor Sex stacked, Group size as a Factor on the X axis, and Number on the Y axis (summed so that there is only one value for each Sex by Group_size combination). Hi Chandra, It's a bit hard to work out exactly what you want, but try this: barplot(matrix(c(10,3,NA,1,2,3),ncol=2),col=c(lightblue,pink,pink), names.arg=1:2,xlab=Group size,ylab=Number,main=Group Sex) legend(1.6,8,c(Male,Female),fill=c(lightblue,pink)) now I have fudged a bit by just making the matrix contain the values in the right order, but if the barplot is what you want, it could get you started. Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loop for inserting rows in a matrix
Dear R friends, I have a matrix with 2060 rows and 41 columns. One column is Date, another is Transect, and another is Segment. I want to ensure that there are 9 Transects (1 to 9) for each Date, and 8 Segments (1 to 8) for each Transect in the matrix, by inserting rows where these are missing. I am new to coding, but am trying to write a loop which checks if each of the transects already exists, and then adds a row in the appropriate place if it doesn't (I have not tackled the segment part, since I am having problems with the Transect part). I have simplified the matrix to show the code I so far have. The code seems to do the right thing for the first date, but not on subsequent dates. The code is: AerialSurveysm-matrix(c(13/06/2006,19/06/2006,19/06/2006,19/06/2006,19/06/2006,19/06/2006,26/06/2006,4,7,7,7,8,8,3, 2,5,5,4,4,5,2), nrow = 7, ncol = 3) colnames(AerialSurveysm) - c(Date,Transect,Segment) i=1 #start iteration for all dates k=2 #start iteration for all transects m-unique(AerialSurveysm[,1]) for (i in 1:length(m)) { #for each date for (k in 1:9) { #do the following for the total number of transects that there are (1 to 9) NewDat -subset(AerialSurveysm, AerialSurveysm[,1]== m[i])#select date to work on beginning with 1st indx-which(AerialSurveysm[,1]==m[i]) indx-indx[[1]] Check-which(NewDat[,2]==k) NewRow-c(c(m[i]),k,0) if(is.empty(Check)==TRUE)#if the selected date does not have a transect equal to transect k AerialSurveysm-insertRow(AerialSurveysm,indx,NewRow) #add a row to AerialSurveys.m in the location of the correct date } i=i+1 } Thanks for any hints or thoughts on this (maybe I'm tackling it completely the wrong way!)! Chandra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.