Here is a modification that should now find the closest: > myvscan<-data.frame(c(1,NA,1.5),as.POSIXct(c("12:00:00","12:14:00","12:20:00"), + format="%H:%M:%S")) > # convert to numeric > > names(myvscan)<-c("Latitude","DateTime") > > myvscan$tn <- as.numeric(myvscan$DateTime) # numeric for findInterval > > mygarmin<-data.frame(c(20,30,40),as.POSIXct(c("12:00:00","12:10:00","12:15:00"), + format="%H:%M:%S")) > > > names(mygarmin)<-c("Latitude","DateTime") > mygarmin$tn <- as.numeric(mygarmin$DateTime) > > # use 'findInterval' > na.indx <- which(is.na(myvscan$Latitude)) # find NAs > > # create matrix of values to test the range > indices <- findInterval(myvscan$tn[na.indx],mygarmin$tn) > x <- cbind(indices, + abs(myvscan$tn[na.indx] - mygarmin$tn[indices]), # lower + abs(myvscan$tn[na.indx] - mygarmin$tn[indices + 1])) #higher > # now determine which index is closer > closest <- x[,1] + (x[,2] > x[,3]) # determine the proper index > # replace with garmin latitude > myvscan$Latitude[na.indx] <- mygarmin$Latitude[closest] > > > > myvscan Latitude DateTime tn 1 1.0 2009-05-23 12:00:00 1243080000 2 40.0 2009-05-23 12:14:00 1243080840 3 1.5 2009-05-23 12:20:00 1243081200 >
On Fri, May 22, 2009 at 7:39 PM, Tim Clark <mudiver1...@yahoo.com> wrote: > > Jim, > > Thanks! I like the way you use indexing instead of the loops. However, > the find.Interval function does not give the right result. I have been > playing with it and it seems to give the closest number that is less than > the one of interest. In this case, the correct replacement should have been > 40, not 30, since 12:15 from mygarmin is closer to 12:14 in myvscan than > 12:10. Is there a way to get the function to find the closest in value > instead of the next smaller value? I was trying to use which.min to get the > closet date but can't seem to get it to work right either. > > Aloha, > > Tim > > > Tim Clark > Department of Zoology > University of Hawaii > > > --- On Fri, 5/22/09, jim holtman <jholt...@gmail.com> wrote: > > > From: jim holtman <jholt...@gmail.com> > > Subject: Re: [R] Need a faster function to replace missing data > > To: "Tim Clark" <mudiver1...@yahoo.com> > > Cc: r-help@r-project.org > > Date: Friday, May 22, 2009, 7:24 AM > > I think this does what you > > want. It uses 'findInterval' to determine where a > > possible match is: > > > > > > > > myvscan<-data.frame(c(1,NA,1.5),as.POSIXct(c("12:00:00","12:14:00","12:20:00"), > > format="%H:%M:%S")) > > > # convert to numeric > > > > > names(myvscan)<-c("Latitude","DateTime") > > > > > myvscan$tn <- as.numeric(myvscan$DateTime) # > > numeric for findInterval > > > > > > mygarmin<-data.frame(c(20,30,40),as.POSIXct(c("12:00:00","12:10:00","12:15:00"), > > format="%H:%M:%S")) > > > > > > > names(mygarmin)<-c("Latitude","DateTime") > > > mygarmin$tn <- as.numeric(mygarmin$DateTime) > > > > > > # use 'findInterval' > > > na.indx <- which(is.na(myvscan$Latitude)) # find > > NAs > > > > > # replace with garmin latitude > > > myvscan$Latitude[na.indx] <- > > mygarmin$Latitude[findInterval(myvscan$tn[na.indx], > > mygarmin$tn)] > > > > > > > > > myvscan > > Latitude DateTime > > tn > > > > 1 1.0 2009-05-22 12:00:00 1243008000 > > 2 30.0 2009-05-22 12:14:00 1243008840 > > 3 1.5 2009-05-22 12:20:00 1243009200 > > > > > > > > > > > On Fri, May 22, 2009 at 12:45 AM, > > Tim Clark <mudiver1...@yahoo.com> > > wrote: > > > > > > Dear List, > > > > I need some help in coming up with a function that will > > take two data sets, determine if a value is missing in one, > > find a value in the second that was taken at about the same > > time, and substitute the second value in for where the first > > should have been. My problem is from a fish tracking > > study. We put acoustic tags in fish and track them for > > several days. Location data is supposed to be > > automatically recorded every time we detect a > > "ping" from the fish. Unfortunately the GPS had > > some problems and sometimes the fishes depth was recorded > > but not its location. I fortunately had a back-up GPS that > > was taking location data every five minutes. I would like > > to merge the two files, replacing the missing value in the > > vscan (automatic) file with the location from the garmin > > file. Since we were getting vscan records every 1-2 > > seconds and garmin records every 5 minutes, I need to find > > the right place in the vscan file to place the garmin record > > - i.e. the > > > > closest in time, but not greater than 5 minutes. I have > > written a function that does this. However, it works with my > > test data but locks up my computer with my real data. I > > have several million vscan records and several thousand > > garmin records. Is there a better way to do this? > > > > > > > > My function and test data: > > > > > myvscan<-data.frame(c(1,NA,1.5),times(c("12:00:00","12:14:00","12:20:00"))) > > names(myvscan)<-c("Latitude","DateTime") > > > > > mygarmin<-data.frame(c(20,30,40),times(("12:00:00","12:10:00","12:15:00"))) > > names(mygarmin)<-c("Latitude","DateTime") > > > > minute.diff<-1/24/12 #Time diff is in days, so this > > is 5 minutes > > > > for (k in 1:nrow(myvscan)) > > { > > if (is.na(myvscan$Latitude[k])) > > { > > if ((min(abs(mygarmin$DateTime-myvscan$DateTime[k]))) < > > minute.diff ) > > { > > index.min.date<-which.min(abs(mygarmin$DateTime-myvscan$DateTime[k])) > > > > myvscan$Latitude[k]<-mygarmin$Latitude[index.min.date] > > }}} > > > > I appreciate your help and advice. > > > > Aloha, > > > > Tim > > > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > ______________________________________________ > > > > R-help@r-project.org > > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > > > > and provide commented, minimal, self-contained, > > reproducible code. > > > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem that you are trying to solve? > > > > > > > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.