Hello,

If I understand it correctly, something like this will get you what you want.


d <- Sys.Date() + 1:4
d2 <- sample(d, 2)
dat <- data.frame(id = 1:6, date = c(d, d2), value = rnorm(6))

aggregate(dat, by = list(dat$date), FUN = tail, 1)

Hope this helps,

Rui Barradas
Em 26-09-2012 16:19, wwreith escreveu:
  I have several thousand rows of shipment data imported into R as a data
frame, with two columns of particular interest, col 1 is the entry date, and
col 2 is the tracking number (colname is REQ.NR). Tracking numbers should be
unique but on occassion aren't because they get entered more than once. This
creates two or more rows of with the same tracking number but different
dates. I wrote a for loop that will keep the row with the oldest date but it
is extremely slow.

Any suggestions of how I should write this so that it is faster?

# Creates a vector of on the unique tracking numbers #
u<-na.omit(unique(Para.5C$REQ.NR))

# Create Data Frame to rbind unique rows to #
Para.5C.final<-data.frame()

# For each value in u subset Para.5C find the min date and rbind it to
Para.5C.final #
for(i in 1:length(u))
{
   x<-subset(Para.5C,Para.5C$REQ.NR==u[i])
   Para.5C.final<-rbind(Para.5C.final,x[which(x[,1]==min(x[,1])),])
}



--
View this message in context: 
http://r.789695.n4.nabble.com/Removing-duplicates-without-a-for-loop-tp4644255.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to