I think I have to give up and grudgingly revert to pandas/R - I just tried 
to do this within a loop, dropping observations based on comparisons of a 
number of columns numbered by years with some transformations of other 
columns in the corresponding year. This is my (failed) attempt:

for i = firstyear:lastyear
    @where(df, array((convert(Symbol, "col1_"*string(i)) .<= 
2*convert(Symbol, "colx_"*string(i))) | 
                     (convert(Symbol, "col1_"*string(i)) .> 
400*convert(Symbol, "colx_"*string(i))) |
                     (convert(Symbol, "col2_"*string(i)) .> 5000) | 
(convert(Symbol, "col2_"*string(i)) .< 500), false))[:col] = NA
end

I think this is beyond salvation and maybe not really feasible with 
DataFrames at the moment. 
For comparison, this would be the Stata command:

replace col`i'=. if col1_`i'<= 2*colx_`i' | col1_`i' > 400*colx_`i' | 
col2_`i' > 5000 | col2_`i' < 500

Of course a highly optimized software package like Stata is an unfair 
comparison, but still the difference is pretty striking...

Reply via email to