Le mercredi 28 mai 2014 à 15:49 +0100, Florian Oswald a écrit :
> oh sorry i forgot to mention that it's a panel, so the lag must be by
> "id" as in the example. it's not a simple time series, but the first
> lagged entry for each "id" must be NA.
I think the easiest way is to write a loop, and since you're using Julia
it may well be faster than convoluted vectorized expressions if written
carefully. Something like (untested):

df[1, :ylag] = NA
for i in 1:size(df, 2)
    if df[i, :id] == df[i - 1, :id]
        df[i, :ylag] = df[i - 1, :y]
    else
        df[i, :ylag] = NA
    end
end


Regards

> 
> On 28 May 2014 15:45, Florian Oswald <[email protected]> wrote:
>         Hi
>         
>         
>         I'm looking for the easiest way to create a lagged variable in
>         a dataframe. I'm almost there with this:
>         
>         
>         df =
>         
> DataFrame(id=repeat([1:3],inner=[3],outer=[1]),time=repeat([1:3],inner=[1],outer=[3]),y=rand(9))
>         by(df2, :id , d ->
>         DataFrame(time=d[:,:time],y=d[:,:y],Ly=[0,d[1:end-1,:y]]))
>         
>         
>         but of course instead of the 0.0 for the first entry for each
>         id I would like to have an NA:
>         
>         
>         by(df, :id , d ->
>         DataFrame(time=d[:,:time],y=d[:,:y],Ly=[NA,d[1:end-1,:y]]))
>         
>         
>         
>         but I cant figure out how to pass a valid data type here. thsi
>         says "ERROR: no method convert(Type{Float64}, NAtype)"
>         
>         
>         i tried also this
>         
>         
>         by(df, :id , d ->
>         DataFrame(time=d[:,:time],y=d[:,:y],Ly=@data([NA,d[1:end-1,:y]])))
>         
>         
>         
>         but that gives "ERROR: no method
>         DataArray{T,N}(DataArray{Float64,1}, Array{Bool,1})".
>         
>         
>         
> 
> 

Reply via email to