PEL wrote on 12/07/2011 02:37:42 PM: > Hi all, > > I have dataframe that was created from the fusion of two dataframes. Both > spanned over the same time intervall but contained different information. > When I put them together, the info overlapped since there is no holes in the > time interval of one of the dataframe. Here is an example where the rows > "sp=A and B" are part of a first df and the rows "sp=C" come from a second. > The first dataframe is continuous but the second consists of sporadic > events. The final dataframe looks like this: > > start end sp > 2010-06-01 17:00:00 2010-06-01 19:30:00 A > 2010-06-01 19:30:01 2010-06-01 20:00:00 B > 2010-06-01 19:45:00 2010-06-01 19:55:00 C > 2010-06-01 20:00:01 2010-06-01 20:30:00 A > 2010-06-01 20:05:00 2010-06-01 20:10:00 C > 2010-06-01 20:12:00 2010-06-01 20:15:00 C > 2010-06-01 20:30:01 2010-06-01 20:40:00 B > 2010-06-01 20:35:00 2010-06-01 20:40:10 C > 2010-06-01 20:40:01 2010-06-01 20:50:00 A > > I would like to prioritize "C" so when it overlaps the time interval of > another "sp", the time interval of "A" or "B" is cut accordingly. As seen in > the example, I sometimes have multiple events of "C" that overlap a single > event of "A" or "B". The result would be this: > > start end sp > 2010-06-01 17:00:00 2010-06-01 19:30:00 A > 2010-06-01 19:30:01 2010-06-01 19:44:59 B > 2010-06-01 19:45:00 2010-06-01 19:55:00 C > 2010-06-01 19:55:01 2010-06-01 20:00:00 B > 2010-06-01 20:00:01 2010-06-01 20:04:59 A > 2010-06-01 20:05:00 2010-06-01 20:10:00 C > 2010-06-01 20:10:01 2010-06-01 20:11:59 A > 2010-06-01 20:12:00 2010-06-01 20:15:00 C > 2010-06-01 20:15:01 2010-06-01 20:30:00 A > 2010-06-01 20:30:01 2010-06-01 20:34:59 B > 2010-06-01 20:35:00 2010-06-01 20:40:10 C > 2010-06-01 20:40:11 2010-06-01 20:50:00 A > > My date/time columns are in POSIXct. Don't hesitate to ask if something is > unclear. > > Thanks in advance
The code below isn't pretty, but it works, at least on the example you provided. It's helpful to provide your example data as working code, for example using the function dput(). Jean df1 <- structure(list(start = structure(c(1275429600, 1275438601, 1275439500, 1275440401, 1275440700, 1275441120, 1275442201, 1275442500, 1275442801), class = c("POSIXct", "POSIXt")), end = structure(c(1275438600, 1275440400, 1275440100, 1275442200, 1275441000, 1275441300, 1275442800, 1275442810, 1275443400), class = c("POSIXct", "POSIXt")), sp = c("A", "B", "C", "A", "C", "C", "B", "C", "A")), .Names = c("start", "end", "sp"), row.names = c(NA, -9L), class = "data.frame") # rearrange data so that all of the times are in one column len1 <- dim(df1)[1] df2 <- data.frame(time=c(df1$start, df1$end), point=rep(c("start", "end"), c(len1, len1)), letter=c(df1$sp, df1$sp)) df2 <- df2[order(df2$time), ] # create a new variable that indicates what long-segment (A or B) other sub-segments (C) are in len2 <- dim(df2)[1] df2$within <- df2$letter last <- df2$letter[1] for(i in 2:len2) if(df2$letter[i]=="C") df2$within[i] <- last else last <- df2$letter[i] # for every sub-segment start, add a new long-segment end 1-second before it newABends <- df2[df2$point=="start" & df2$letter=="C", ] newABends$time <- newABends$time - 1 newABends$point <- "end" newABends$letter <- newABends$within # for every sub-segment end, add a new long-segment start 1-second after it newABstarts <- df2[df2$point=="end" & df2$letter=="C", ] newABstarts$time <- newABstarts$time + 1 newABstarts$point <- "start" newABstarts$letter <- newABstarts$within # combine the original data with the new long-segment starts and ends df3 <- rbind(df2, newABends, newABstarts) df3 <- df3[order(df3$time), ] # get rid of any long-segment bits within sub-segments len3 <- dim(df3)[1] startC <- seq(from=len3)[df3$point=="start" & df3$letter=="C"] endC <- seq(from=len3)[df3$point=="end" & df3$letter=="C"] startendC <- lapply(seq(along=startC), function(i) seq(startC[i], endC[i])) remove.rows <- unlist(lapply(startendC, function(x) x[-c(1, length(x))])) df4 <- df3[-remove.rows, ] # rearrange data so that start and end times are in different columns df4s <- df4[df4$point=="start", c("time")] df4e <- df4[df4$point=="end", c("time", "letter")] names(df4s) <- c("start") names(df4e) <- c("end", "letter") cbind(df4s, df4e) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.