Re: [julia-users] how to create a lagged variable in a DataFrame

Thibaut Lamadon Thu, 25 Sep 2014 13:39:56 -0700

Hi all, 

I wrote a bit of code to do something of the same sort. I wrote it to be 
able to get the wage of a given individual in a panel in the same quarter 
in the year before.


With data.table I would setkey(data, individual, year, quarter) and then 
use the amazing J(individual, year-1, quarter).

I wanted something similar, and as fast as possible, here is what I came up 
with:

I =  (1:1000000) - 1
d = DataFrame(
  t = map(x -> rem(x,10),I),
  n = map(x -> div(x,10),I), 
  x = rand(1:100, 1000000))


# create an index for location
cur = map(hash, zip(d[:n],d[:t]));

# create a hash table
hh = Dict(cur,1:100000);

# create an index for target
trg = map(hash, zip(d[:n],d[:t]-1));

# map the evaluation
val = d[:x];
d[:tlag] = map( x-> val[get(hh,x,1)], trg);

there are a bunch of shortcuts. For instance if I don't find the value I 
put the first one, but I should put an NA. Also the Dictionary whould 
ideally be created within n to make it faster.

But this works relatively ok for my needs,

any comments are welcome!

cheers,

t.




On Wednesday, 28 May 2014 10:04:25 UTC-5, Milan Bouchet-Valat wrote:
>
> Le mercredi 28 mai 2014 à 15:49 +0100, Florian Oswald a écrit : 
> > oh sorry i forgot to mention that it's a panel, so the lag must be by 
> > "id" as in the example. it's not a simple time series, but the first 
> > lagged entry for each "id" must be NA. 
> I think the easiest way is to write a loop, and since you're using Julia 
> it may well be faster than convoluted vectorized expressions if written 
> carefully. Something like (untested): 
>
> df[1, :ylag] = NA 
> for i in 1:size(df, 2) 
>     if df[i, :id] == df[i - 1, :id] 
>         df[i, :ylag] = df[i - 1, :y] 
>     else 
>         df[i, :ylag] = NA 
>     end 
> end 
>
>
> Regards 
>
> > 
> > On 28 May 2014 15:45, Florian Oswald <[email protected] <javascript:>> 
> wrote: 
> >         Hi 
> >         
> >         
> >         I'm looking for the easiest way to create a lagged variable in 
> >         a dataframe. I'm almost there with this: 
> >         
> >         
> >         df = 
> >         
> DataFrame(id=repeat([1:3],inner=[3],outer=[1]),time=repeat([1:3],inner=[1],outer=[3]),y=rand(9))
>  
>
> >         by(df2, :id , d -> 
> >         DataFrame(time=d[:,:time],y=d[:,:y],Ly=[0,d[1:end-1,:y]])) 
> >         
> >         
> >         but of course instead of the 0.0 for the first entry for each 
> >         id I would like to have an NA: 
> >         
> >         
> >         by(df, :id , d -> 
> >         DataFrame(time=d[:,:time],y=d[:,:y],Ly=[NA,d[1:end-1,:y]])) 
> >         
> >         
> >         
> >         but I cant figure out how to pass a valid data type here. thsi 
> >         says "ERROR: no method convert(Type{Float64}, NAtype)" 
> >         
> >         
> >         i tried also this 
> >         
> >         
> >         by(df, :id , d -> 
> >         
> DataFrame(time=d[:,:time],y=d[:,:y],Ly=@data([NA,d[1:end-1,:y]]))) 
> >         
> >         
> >         
> >         but that gives "ERROR: no method 
> >         DataArray{T,N}(DataArray{Float64,1}, Array{Bool,1})". 
> >         
> >         
> >         
> > 
> > 
>
>

Re: [julia-users] how to create a lagged variable in a DataFrame

Reply via email to