PEL wrote on 12/07/2011 02:37:42 PM:

> Hi all,
> 
> I have dataframe that was created from the fusion of two dataframes. 
Both
> spanned over the same time intervall but contained different 
information.
> When I put them together, the info overlapped since there is no holes in 
the
> time interval of one of the dataframe. Here is an example where the rows
> "sp=A and B" are part of a first df and the rows "sp=C" come from a 
second.
> The first dataframe is continuous but the second consists of sporadic
> events. The final dataframe looks like this: 
> 
> start                               end sp
> 2010-06-01 17:00:00    2010-06-01 19:30:00         A
> 2010-06-01 19:30:01    2010-06-01 20:00:00         B
> 2010-06-01 19:45:00    2010-06-01 19:55:00         C
> 2010-06-01 20:00:01    2010-06-01 20:30:00         A
> 2010-06-01 20:05:00    2010-06-01 20:10:00         C
> 2010-06-01 20:12:00    2010-06-01 20:15:00         C
> 2010-06-01 20:30:01    2010-06-01 20:40:00         B
> 2010-06-01 20:35:00    2010-06-01 20:40:10         C
> 2010-06-01 20:40:01    2010-06-01 20:50:00         A
> 
> I would like to prioritize "C" so when it overlaps the time interval of
> another "sp", the time interval of "A" or "B" is cut accordingly. As 
seen in
> the example, I sometimes have multiple events of "C" that overlap a 
single
> event of "A" or "B". The result would be this:
> 
> start                               end sp
> 2010-06-01 17:00:00    2010-06-01 19:30:00         A
> 2010-06-01 19:30:01    2010-06-01 19:44:59         B
> 2010-06-01 19:45:00    2010-06-01 19:55:00         C
> 2010-06-01 19:55:01    2010-06-01 20:00:00         B
> 2010-06-01 20:00:01    2010-06-01 20:04:59         A
> 2010-06-01 20:05:00    2010-06-01 20:10:00         C
> 2010-06-01 20:10:01    2010-06-01 20:11:59         A
> 2010-06-01 20:12:00    2010-06-01 20:15:00         C
> 2010-06-01 20:15:01    2010-06-01 20:30:00         A
> 2010-06-01 20:30:01    2010-06-01 20:34:59         B
> 2010-06-01 20:35:00    2010-06-01 20:40:10         C
> 2010-06-01 20:40:11    2010-06-01 20:50:00         A
> 
> My date/time columns are in POSIXct. Don't hesitate to ask if something 
is
> unclear.
> 
> Thanks in advance


The code below isn't pretty, but it works, at least on the example you 
provided.

It's helpful to provide your example data as working code, for example 
using the function dput().

Jean


df1 <- structure(list(start = structure(c(1275429600, 1275438601, 
1275439500, 
1275440401, 1275440700, 1275441120, 1275442201, 1275442500, 1275442801), 
class = c("POSIXct", "POSIXt")), end = structure(c(1275438600, 1275440400, 

1275440100, 1275442200, 1275441000, 1275441300, 1275442800, 1275442810, 
1275443400), class = c("POSIXct", "POSIXt")), sp = c("A", 
"B", "C", "A", "C", "C", "B", "C", "A")), .Names = c("start", 
"end", "sp"), row.names = c(NA, -9L), class = "data.frame")

# rearrange data so that all of the times are in one column
len1 <- dim(df1)[1] 
df2 <- data.frame(time=c(df1$start, df1$end), point=rep(c("start", "end"), 
c(len1, len1)), letter=c(df1$sp, df1$sp))
df2 <- df2[order(df2$time), ]

# create a new variable that indicates what long-segment (A or B) other 
sub-segments (C) are in
len2 <- dim(df2)[1]
df2$within <- df2$letter
last <- df2$letter[1]
for(i in 2:len2) if(df2$letter[i]=="C") df2$within[i] <- last else last <- 
df2$letter[i]

# for every sub-segment start, add a new long-segment end 1-second before 
it
newABends <- df2[df2$point=="start" & df2$letter=="C", ]
newABends$time <- newABends$time - 1
newABends$point <- "end"
newABends$letter <- newABends$within

# for every sub-segment end, add a new long-segment start 1-second after 
it
newABstarts <- df2[df2$point=="end" & df2$letter=="C", ]
newABstarts$time <- newABstarts$time + 1
newABstarts$point <- "start"
newABstarts$letter <- newABstarts$within

# combine the original data with the new long-segment starts and ends
df3 <- rbind(df2, newABends, newABstarts)
df3 <- df3[order(df3$time), ]

# get rid of any long-segment bits within sub-segments
len3 <- dim(df3)[1]
startC <- seq(from=len3)[df3$point=="start" & df3$letter=="C"]
endC <- seq(from=len3)[df3$point=="end" & df3$letter=="C"]
startendC <- lapply(seq(along=startC), function(i) seq(startC[i], 
endC[i]))
remove.rows <- unlist(lapply(startendC, function(x) x[-c(1, length(x))]))
df4 <- df3[-remove.rows, ]

# rearrange data so that start and end times are in different columns
df4s <- df4[df4$point=="start", c("time")]
df4e <- df4[df4$point=="end", c("time", "letter")]
names(df4s) <- c("start")
names(df4e) <- c("end", "letter")
cbind(df4s, df4e)
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to