Dear all, I found out about the data.table package yesterday, and I've been very impressed with the speed so far.
I'm currently trying to calculate lagged values of a variable for some panel data set. The approach I take is the one below, but I'm not sure whether that is correct, and whether it is safe to do in this way (e.g. if data is re-ordered). Hopefully, one of you will be able to guide me in the right direction. Many thanks in advance, Jelmer # create some data, with missing periods DT <- data.table( 'id'=c(1,1,1,1,2,2,2), 'time'=c(1,2,4,5,2,3,4), 'value'=c(7,6,5,4,3,2,1) ) setkey( DT, id, time ) # extract id and time values to construct data.table with keys, including missing periods id.vec <- unique( DT$id ) time.vec <- unique( DT$time ) num.periods <- max( time.vec ) - min( time.vec ) + 1 tmp.keys <- data.table( 'id'=rep( id.vec, each=num.periods ), 'time'=rep( time.vec, length(id.vec) ) ) setkey( tmp.keys, id, time ) # show data DT # original data DT[ tmp.keys ] # fill in missing periods with NA # shift 'value' column by id to get lagged 'value' DT[ tmp.keys ][, list(time,value,'value.lag'=c(NA,value[-length(value)])), by=id ] _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
