PEL wrote on 12/07/2011 02:37:42 PM:

> Hi all,
> I have dataframe that was created from the fusion of two dataframes. 
> spanned over the same time intervall but contained different 
> When I put them together, the info overlapped since there is no holes in 
> time interval of one of the dataframe. Here is an example where the rows
> "sp=A and B" are part of a first df and the rows "sp=C" come from a 
> The first dataframe is continuous but the second consists of sporadic
> events. The final dataframe looks like this: 
> start                               end sp
> 2010-06-01 17:00:00    2010-06-01 19:30:00         A
> 2010-06-01 19:30:01    2010-06-01 20:00:00         B
> 2010-06-01 19:45:00    2010-06-01 19:55:00         C
> 2010-06-01 20:00:01    2010-06-01 20:30:00         A
> 2010-06-01 20:05:00    2010-06-01 20:10:00         C
> 2010-06-01 20:12:00    2010-06-01 20:15:00         C
> 2010-06-01 20:30:01    2010-06-01 20:40:00         B
> 2010-06-01 20:35:00    2010-06-01 20:40:10         C
> 2010-06-01 20:40:01    2010-06-01 20:50:00         A
> I would like to prioritize "C" so when it overlaps the time interval of
> another "sp", the time interval of "A" or "B" is cut accordingly. As 
seen in
> the example, I sometimes have multiple events of "C" that overlap a 
> event of "A" or "B". The result would be this:
> start                               end sp
> 2010-06-01 17:00:00    2010-06-01 19:30:00         A
> 2010-06-01 19:30:01    2010-06-01 19:44:59         B
> 2010-06-01 19:45:00    2010-06-01 19:55:00         C
> 2010-06-01 19:55:01    2010-06-01 20:00:00         B
> 2010-06-01 20:00:01    2010-06-01 20:04:59         A
> 2010-06-01 20:05:00    2010-06-01 20:10:00         C
> 2010-06-01 20:10:01    2010-06-01 20:11:59         A
> 2010-06-01 20:12:00    2010-06-01 20:15:00         C
> 2010-06-01 20:15:01    2010-06-01 20:30:00         A
> 2010-06-01 20:30:01    2010-06-01 20:34:59         B
> 2010-06-01 20:35:00    2010-06-01 20:40:10         C
> 2010-06-01 20:40:11    2010-06-01 20:50:00         A
> My date/time columns are in POSIXct. Don't hesitate to ask if something 
> unclear.
> Thanks in advance

The code below isn't pretty, but it works, at least on the example you 

It's helpful to provide your example data as working code, for example 
using the function dput().


df1 <- structure(list(start = structure(c(1275429600, 1275438601, 
1275440401, 1275440700, 1275441120, 1275442201, 1275442500, 1275442801), 
class = c("POSIXct", "POSIXt")), end = structure(c(1275438600, 1275440400, 

1275440100, 1275442200, 1275441000, 1275441300, 1275442800, 1275442810, 
1275443400), class = c("POSIXct", "POSIXt")), sp = c("A", 
"B", "C", "A", "C", "C", "B", "C", "A")), .Names = c("start", 
"end", "sp"), row.names = c(NA, -9L), class = "data.frame")

# rearrange data so that all of the times are in one column
len1 <- dim(df1)[1] 
df2 <- data.frame(time=c(df1$start, df1$end), point=rep(c("start", "end"), 
c(len1, len1)), letter=c(df1$sp, df1$sp))
df2 <- df2[order(df2$time), ]

# create a new variable that indicates what long-segment (A or B) other 
sub-segments (C) are in
len2 <- dim(df2)[1]
df2$within <- df2$letter
last <- df2$letter[1]
for(i in 2:len2) if(df2$letter[i]=="C") df2$within[i] <- last else last <- 

# for every sub-segment start, add a new long-segment end 1-second before 
newABends <- df2[df2$point=="start" & df2$letter=="C", ]
newABends$time <- newABends$time - 1
newABends$point <- "end"
newABends$letter <- newABends$within

# for every sub-segment end, add a new long-segment start 1-second after 
newABstarts <- df2[df2$point=="end" & df2$letter=="C", ]
newABstarts$time <- newABstarts$time + 1
newABstarts$point <- "start"
newABstarts$letter <- newABstarts$within

# combine the original data with the new long-segment starts and ends
df3 <- rbind(df2, newABends, newABstarts)
df3 <- df3[order(df3$time), ]

# get rid of any long-segment bits within sub-segments
len3 <- dim(df3)[1]
startC <- seq(from=len3)[df3$point=="start" & df3$letter=="C"]
endC <- seq(from=len3)[df3$point=="end" & df3$letter=="C"]
startendC <- lapply(seq(along=startC), function(i) seq(startC[i], 
remove.rows <- unlist(lapply(startendC, function(x) x[-c(1, length(x))]))
df4 <- df3[-remove.rows, ]

# rearrange data so that start and end times are in different columns
df4s <- df4[df4$point=="start", c("time")]
df4e <- df4[df4$point=="end", c("time", "letter")]
names(df4s) <- c("start")
names(df4e) <- c("end", "letter")
cbind(df4s, df4e)
        [[alternative HTML version deleted]]

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Reply via email to