Here is a modification that should now find the closest:

>
myvscan<-data.frame(c(1,NA,1.5),as.POSIXct(c("12:00:00","12:14:00","12:20:00"),
+ format="%H:%M:%S"))
> # convert to numeric
>
> names(myvscan)<-c("Latitude","DateTime")
>
> myvscan$tn <- as.numeric(myvscan$DateTime)  # numeric for findInterval
>
>
mygarmin<-data.frame(c(20,30,40),as.POSIXct(c("12:00:00","12:10:00","12:15:00"),
+ format="%H:%M:%S"))
>
>
> names(mygarmin)<-c("Latitude","DateTime")
> mygarmin$tn <- as.numeric(mygarmin$DateTime)
>
> # use 'findInterval'
> na.indx <- which(is.na(myvscan$Latitude))  # find NAs
>
> # create matrix of values to test the range
> indices <- findInterval(myvscan$tn[na.indx],mygarmin$tn)
> x <- cbind(indices,
+            abs(myvscan$tn[na.indx] - mygarmin$tn[indices]), # lower
+            abs(myvscan$tn[na.indx] - mygarmin$tn[indices + 1]))  #higher
> # now determine which index is closer
> closest <- x[,1] + (x[,2] > x[,3])  # determine the proper index
> # replace with garmin latitude
> myvscan$Latitude[na.indx] <- mygarmin$Latitude[closest]
>
>
>
> myvscan
  Latitude            DateTime         tn
1      1.0 2009-05-23 12:00:00 1243080000
2     40.0 2009-05-23 12:14:00 1243080840
3      1.5 2009-05-23 12:20:00 1243081200
>


On Fri, May 22, 2009 at 7:39 PM, Tim Clark <mudiver1...@yahoo.com> wrote:

>
> Jim,
>
> Thanks!  I like the way you use indexing instead of the loops.  However,
> the find.Interval function does not give the right result.  I have been
> playing with it and it seems to give the closest number that is less than
> the one of interest.  In this case, the correct replacement should have been
> 40, not 30, since 12:15 from mygarmin is closer to 12:14 in myvscan than
> 12:10.  Is there a way to get the function to find the closest in value
> instead of the next smaller value?  I was trying to use which.min to get the
> closet date but can't seem to get it to work right either.
>
> Aloha,
>
> Tim
>
>
> Tim Clark
> Department of Zoology
> University of Hawaii
>
>
> --- On Fri, 5/22/09, jim holtman <jholt...@gmail.com> wrote:
>
> > From: jim holtman <jholt...@gmail.com>
> > Subject: Re: [R] Need a faster function to replace missing data
> > To: "Tim Clark" <mudiver1...@yahoo.com>
> > Cc: r-help@r-project.org
> > Date: Friday, May 22, 2009, 7:24 AM
>  > I think this does what you
> > want.  It uses 'findInterval' to determine where a
> > possible match is:
> >
> > >
> >
> myvscan<-data.frame(c(1,NA,1.5),as.POSIXct(c("12:00:00","12:14:00","12:20:00"),
> > format="%H:%M:%S"))
> > > # convert to numeric
> > >
> > names(myvscan)<-c("Latitude","DateTime")
> >
> > > myvscan$tn <- as.numeric(myvscan$DateTime)  #
> > numeric for findInterval
> > >
> >
> mygarmin<-data.frame(c(20,30,40),as.POSIXct(c("12:00:00","12:10:00","12:15:00"),
> > format="%H:%M:%S"))
> >
> > >
> > names(mygarmin)<-c("Latitude","DateTime")
> > > mygarmin$tn <- as.numeric(mygarmin$DateTime)
> > >
> > > # use 'findInterval'
> > > na.indx <- which(is.na(myvscan$Latitude))  # find
> > NAs
> >
> > > # replace with garmin latitude
> > > myvscan$Latitude[na.indx] <-
> > mygarmin$Latitude[findInterval(myvscan$tn[na.indx],
> > mygarmin$tn)]
> > >
> > >
> > > myvscan
> >   Latitude            DateTime
> > tn
> >
> > 1      1.0 2009-05-22 12:00:00 1243008000
> > 2     30.0 2009-05-22 12:14:00 1243008840
> > 3      1.5 2009-05-22 12:20:00 1243009200
> > >
> >
> >
> >
> > On Fri, May 22, 2009 at 12:45 AM,
> > Tim Clark <mudiver1...@yahoo.com>
> > wrote:
> >
> >
> > Dear List,
> >
> > I need some help in coming up with a function that will
> > take two data sets, determine if a value is missing in one,
> > find a value in the second that was taken at about the same
> > time, and substitute the second value in for where the first
> > should have been.  My problem is from a fish tracking
> > study.  We put acoustic tags in fish and track them for
> > several days.  Location data is supposed to be
> > automatically recorded every time we detect a
> > "ping" from the fish.  Unfortunately the GPS had
> > some problems and sometimes the fishes depth was recorded
> > but not its location.  I fortunately had a back-up GPS that
> > was taking location data every five minutes.  I would like
> > to merge the two files, replacing the missing value in the
> > vscan (automatic) file with the location from the garmin
> > file.  Since we were getting vscan records every 1-2
> > seconds and garmin records every 5 minutes, I need to find
> > the right place in the vscan file to place the garmin record
> > - i.e. the
> >
> >  closest in time, but not greater than 5 minutes.  I have
> > written a function that does this. However, it works with my
> > test data but locks up my computer with my real data.  I
> > have several million vscan records and several thousand
> > garmin records.  Is there a better way to do this?
> >
> >
> >
> > My function and test data:
> >
> >
> myvscan<-data.frame(c(1,NA,1.5),times(c("12:00:00","12:14:00","12:20:00")))
> > names(myvscan)<-c("Latitude","DateTime")
> >
> >
> mygarmin<-data.frame(c(20,30,40),times(("12:00:00","12:10:00","12:15:00")))
> > names(mygarmin)<-c("Latitude","DateTime")
> >
> > minute.diff<-1/24/12   #Time diff is in days, so this
> > is 5 minutes
> >
> > for (k in 1:nrow(myvscan))
> > {
> > if (is.na(myvscan$Latitude[k]))
> > {
> > if ((min(abs(mygarmin$DateTime-myvscan$DateTime[k]))) <
> > minute.diff )
> > {
> > index.min.date<-which.min(abs(mygarmin$DateTime-myvscan$DateTime[k]))
> >
> > myvscan$Latitude[k]<-mygarmin$Latitude[index.min.date]
> > }}}
> >
> > I appreciate your help and advice.
> >
> > Aloha,
> >
> > Tim
> >
> >
> >
> >
> > Tim Clark
> > Department of Zoology
> > University of Hawaii
> >
> > ______________________________________________
> >
> > R-help@r-project.org
> > mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem that you are trying to solve?
> >
> >
>
>
>
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to