Re: [datatable-help] calculating lagged values of panel data using data.table

Jelmer Ypma Fri, 15 Jul 2011 13:13:53 -0700

Hi,

many thanks for your reply, the Cross Join bit is very useful to know!


Thanks,
Jelmer

On Fri, Jul 15, 2011 at 20:36, Matthew Dowle <[email protected]> wrote:
> Hi,
>
> Welcome!
>
> Yes, that's safe to assume. Others have asked that in the past, so I've
> now added this text to ?data.table (under 'by'):  "The order of the rows
> within each group is preserved."
>
> The tmp.keys operation is a common one, so there's a short cut for it.
> We call it a Cross Join. If you run the CJ() bit on it's own you'll find
> it's the same as tmp.keys :
>
>> DT[CJ(id.vec,time.vec)]
>      id time value
>  [1,]  1    1     7
>  [2,]  1    2     6
>  [3,]  1    3    NA
>  [4,]  1    4     5
>  [5,]  1    5     4
>  [6,]  2    1    NA
>  [7,]  2    2     3
>  [8,]  2    3     2
>  [9,]  2    4     1
> [10,]  2    5    NA
>>
>
> Rather than filling the gaps with NA, you can roll the prevailing
> observation forward :
>
>> DT[CJ(id.vec,time.vec),roll=TRUE]
>      id time value
>  [1,]  1    1     7
>  [2,]  1    2     6
>  [3,]  1    3     6
>  [4,]  1    4     5
>  [5,]  1    5     4
>  [6,]  2    1    NA     # correct not to be 4 (new group)
>  [7,]  2    2     3
>  [8,]  2    3     2
>  [9,]  2    4     1
> [10,]  2    5     1
>>
>
> and that's useful because, now, instead of lagging in the way you did
> it, you can just subtract one day from your dates or times, and find the
> prevailing record then. It is subtly different of course, it depends on
> what you want and what assumptions you make about the data.
>
>> DT[CJ(id.vec,time.vec-1),roll=TRUE]
>      id time value
>  [1,]  1    0    NA
>  [2,]  1    1     7
>  [3,]  1    2     6
>  [4,]  1    3     6
>  [5,]  1    4     5
>  [6,]  2    0    NA
>  [7,]  2    1    NA
>  [8,]  2    2     3
>  [9,]  2    3     2
> [10,]  2    4     1
>>
>
> On your final line I didn't see why you chained it into two queries, but
> fortunately I tested before posting and it seems there is a bug :
>
>> DT[ tmp.keys,     # removed your "][," here
> + list(time,value,'value.lag'=c(NA,value[-length(value)])), by=id ]
> Error in abs(j) : Non-numeric argument to mathematical function
> In addition: Warning message:
> In is.na(j) : is.na() applied to non-(list or vector) of type 'NULL'
>>
>
> That should work fine, but doesn't. I'll raise it as a bug ...
>
> So, not bad going for a first post!
>
> Thanks,
> Matthew
>
>
> On Fri, 2011-07-15 at 11:33 +0200, Jelmer Ypma wrote:
>> Dear all,
>>
>> I found out about the data.table package yesterday, and I've been very
>> impressed with the speed so far.
>>
>> I'm currently trying to calculate lagged values of a variable for some
>> panel data set. The approach I take is the one below, but I'm not sure
>> whether that is correct, and whether it is safe to do in this way
>> (e.g. if data is re-ordered).
>>
>> Hopefully, one of you will be able to guide me in the right direction.
>>
>> Many thanks in advance,
>> Jelmer
>>
>> # create some data, with missing periods
>> DT <- data.table( 'id'=c(1,1,1,1,2,2,2), 'time'=c(1,2,4,5,2,3,4),
>> 'value'=c(7,6,5,4,3,2,1) )
>> setkey( DT, id, time )
>>
>> # extract id and time values to construct data.table with keys,
>> including missing periods
>> id.vec <- unique( DT$id )
>> time.vec <- unique( DT$time )
>> num.periods <- max( time.vec ) - min( time.vec ) + 1
>> tmp.keys <- data.table( 'id'=rep( id.vec, each=num.periods ),
>> 'time'=rep( time.vec, length(id.vec) ) )
>> setkey( tmp.keys, id, time )
>>
>> # show data
>> DT                                                  # original data
>> DT[ tmp.keys ]                                      # fill in missing
>> periods with NA
>>
>> # shift 'value' column by id to get lagged 'value'
>> DT[ tmp.keys ][,
>> list(time,value,'value.lag'=c(NA,value[-length(value)])), by=id ]
>> _______________________________________________
>> datatable-help mailing list
>> [email protected]
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] calculating lagged values of panel data using data.table

Reply via email to