Hi all, I wrote a bit of code to do something of the same sort. I wrote it to be able to get the wage of a given individual in a panel in the same quarter in the year before.
With data.table I would setkey(data, individual, year, quarter) and then use the amazing J(individual, year-1, quarter). I wanted something similar, and as fast as possible, here is what I came up with: I = (1:1000000) - 1 d = DataFrame( t = map(x -> rem(x,10),I), n = map(x -> div(x,10),I), x = rand(1:100, 1000000)) # create an index for location cur = map(hash, zip(d[:n],d[:t])); # create a hash table hh = Dict(cur,1:100000); # create an index for target trg = map(hash, zip(d[:n],d[:t]-1)); # map the evaluation val = d[:x]; d[:tlag] = map( x-> val[get(hh,x,1)], trg); there are a bunch of shortcuts. For instance if I don't find the value I put the first one, but I should put an NA. Also the Dictionary whould ideally be created within n to make it faster. But this works relatively ok for my needs, any comments are welcome! cheers, t. On Wednesday, 28 May 2014 10:04:25 UTC-5, Milan Bouchet-Valat wrote: > > Le mercredi 28 mai 2014 à 15:49 +0100, Florian Oswald a écrit : > > oh sorry i forgot to mention that it's a panel, so the lag must be by > > "id" as in the example. it's not a simple time series, but the first > > lagged entry for each "id" must be NA. > I think the easiest way is to write a loop, and since you're using Julia > it may well be faster than convoluted vectorized expressions if written > carefully. Something like (untested): > > df[1, :ylag] = NA > for i in 1:size(df, 2) > if df[i, :id] == df[i - 1, :id] > df[i, :ylag] = df[i - 1, :y] > else > df[i, :ylag] = NA > end > end > > > Regards > > > > > On 28 May 2014 15:45, Florian Oswald <[email protected] <javascript:>> > wrote: > > Hi > > > > > > I'm looking for the easiest way to create a lagged variable in > > a dataframe. I'm almost there with this: > > > > > > df = > > > DataFrame(id=repeat([1:3],inner=[3],outer=[1]),time=repeat([1:3],inner=[1],outer=[3]),y=rand(9)) > > > > by(df2, :id , d -> > > DataFrame(time=d[:,:time],y=d[:,:y],Ly=[0,d[1:end-1,:y]])) > > > > > > but of course instead of the 0.0 for the first entry for each > > id I would like to have an NA: > > > > > > by(df, :id , d -> > > DataFrame(time=d[:,:time],y=d[:,:y],Ly=[NA,d[1:end-1,:y]])) > > > > > > > > but I cant figure out how to pass a valid data type here. thsi > > says "ERROR: no method convert(Type{Float64}, NAtype)" > > > > > > i tried also this > > > > > > by(df, :id , d -> > > > DataFrame(time=d[:,:time],y=d[:,:y],Ly=@data([NA,d[1:end-1,:y]]))) > > > > > > > > but that gives "ERROR: no method > > DataArray{T,N}(DataArray{Float64,1}, Array{Bool,1})". > > > > > > > > > > > >
