I would like to do an explicit loop over a large DataFrame and evaluate a
function which depends on a subset of the columns in an arbitrary way. What
would be the fastest way to accomplish this? Presently, I’m doing something like
~~~
f(df::DataFrame, i::Integer) = df[i, :a] * df[i, :b] + df[i, :c]
for i=1:nrow(df)
x = f(df, i)
end
~~~
which according to Profile creates a major bottleneck.
Would it make sense to somehow pre-create an immutable type corresponding to a
single row (my data are BitsKind), and run a compiled function on these
row-objects with strong typing?
Thanks in advance for any advice,
Joosep