Thanks!

I wasn’t aware of eachrow, this seems quite close to what I had in mind. I ran 
some simplistic timing checks [1], and the eachrow method is 2-3x faster. I 
also tried the type asserts, byt they didn’t seem to make a difference. I 
forgot to mention earlier that my data can also be NA, so it’s not that easy 
for the compiler.

[1] 
http://nbviewer.ipython.org/urls/dl.dropbox.com/s/mj8g1s0ewmpd1b6/dataframe_iter_speed.ipynb?create=1

Cheers,
Joosep

On 01 Feb 2014, at 15:11, David van Leeuwen <[email protected]> wrote:

> Hi, 
> 
> There now is the eachrow iterator which might do what you want more 
> efficiently.
> 
> df = DataFrame(a=1:2, b=2:3)
> func(r::DataFrameRow) = r["a"] * r["b"]
> for r in eachrow(df)
>        println(func(r))
> end
> you can also use integer indices for the dataframerow r, r[1] * r[2]
> 
> Cheers, 
> 
> ---david
> 
> On Saturday, February 1, 2014 1:25:04 PM UTC+1, Joosep Pata wrote:
> I would like to do an explicit loop over a large DataFrame and evaluate a 
> function which depends on a subset of the columns in an arbitrary way. What 
> would be the fastest way to accomplish this? Presently, I’m doing something 
> like 
> 
> ~~~ 
> f(df::DataFrame, i::Integer) = df[i, :a] * df[i, :b] + df[i, :c] 
> 
> for i=1:nrow(df) 
>         x = f(df, i) 
> end 
> ~~~ 
> 
> which according to Profile creates a major bottleneck. 
> 
> Would it make sense to somehow pre-create an immutable type corresponding to 
> a single row (my data are BitsKind), and run a compiled function on these 
> row-objects with strong typing? 
> 
> Thanks in advance for any advice, 
> Joosep

Reply via email to