so i'm guessing i'm doing something wrong, this is much slower... i've simplified things in my code a little to maybe help see what is going on
for c1 = levels[1], c2 = levels[2], c3 = levels[3], c4 = levels[4], c5 = levels[5], c6 = levels[6], c7 = levels[7], c8 = levels[8], c9 = levels[9], c10 = levels[10], c11 = levels[11], c12 = levels[12], c13 = levels[13] for j = 1:rows s[j] = df[j, c1] s[j] += df[j, c2] s[j] += df[j, c3] s[j] += df[j, c4] s[j] += df[j, c5] s[j] += df[j, c6] s[j] += df[j, c7] s[j] += df[j, c8] s[j] += df[j, c9] s[j] += df[j, c10] s[j] += df[j, c11] s[j] += df[j, c12] s[j] += df[j, c13] end end On Sunday, March 2, 2014 12:17:55 PM UTC-5, Jason Solack wrote: > > i will give this a shot. thank you for the reply and all your work on > Julia/DataFrames. It's much appreciated! > > On Sunday, March 2, 2014 12:13:22 PM UTC-5, John Myles White wrote: >> >> I’m a little fuzzy still, but I think the answer is probably still that >> the problem you’re hitting is the indexing into the DataFrame isn’t >> sufficient to let the compiler know that the return type of the index is >> always a Float64. So you’ll want to try some of the tricks described in the >> thead I linked to, which reduce to doing things like the following: >> >> da = @data([1.0, 2.0, 3.0]) >> s = 0.0 >> for i in 1:3 >> s += da.data[i] >> end >> >> — John >> >> On Mar 2, 2014, at 9:08 AM, Jason Solack <jays...@gmail.com> wrote: >> >> the DataFrame contains floats and i'd ultimately like to have an array of >> size nrow(data) with the sum of those 13 columns in it (the column >> combination changes with each iteration). >> >> Is that enough detail? >> >> I've done the entire algorithm in c++ and at this point julia is a bit >> slower, but i have a feeling parts of my Julia code are just not efficient >> and therefore could be faster! >> >> On Sunday, March 2, 2014 12:03:48 PM UTC-5, John Myles White wrote: >>> >>> Hi Jason, >>> >>> Can you give a few more details about what objects are? What is data? >>> What is levels? >>> >>> In general, the performance problems with DataFrames are actually >>> performance issues with DataArrays not letting type inference work well. We >>> still haven’t agreed on the right solution, but this thread lays out the >>> relevant issues: https://github.com/JuliaStats/DataArrays.jl/issues/71 >>> >>> — John >>> >>> On Mar 2, 2014, at 9:01 AM, Jason Solack <jays...@gmail.com> wrote: >>> >>> > Hello everyone, >>> > >>> > i am doing several millions of iterations over a dataframe and i need >>> to perform several computations over various combinations of columns. The >>> first of which is a simple sum of 13 column, this appears to be a slow >>> point of execution. >>> > >>> > right now i'm doing something like this: >>> > >>> > for c1 = levels[1], c2 = levels[2], c3 = levels[3], c4 = >>> levels[4], c5 = levels[5], c6 = levels[6], c7 = levels[7], c8 = levels[8], >>> c9 = levels[9], c10 = levels[10], c11 = levels[11], c12 = levels[12], c13 = >>> levels[13] >>> > >>> > s = data[levelStartColums[1]+(c1-1)] .+ >>> > data[levelStartColums[2]+(c2-1)] .+ >>> > data[levelStartColums[3]+(c3-1)] .+ >>> > data[levelStartColums[4]+(c4-1)] .+ >>> > data[levelStartColums[5]+(c5-1)] .+ >>> > data[levelStartColums[6]+(c6-1)] .+ >>> > data[levelStartColums[7]+(c7-1)] .+ >>> > data[levelStartColums[8]+(c8-1)] .+ >>> > data[levelStartColums[9]+(c9-1)] .+ >>> > data[levelStartColums[10]+(c10-1)] .+ >>> > data[levelStartColums[11]+(c11-1)] .+ >>> > data[levelStartColums[12]+(c12-1)] .+ >>> > data[levelStartColums[13]+(c13-1)] >>> > end >>> > >>> > does anyone have any tips on how to write this so it will execute >>> faster? >>> > >>> > Thank you in advance! >>> > >>> > Jason >>> >>> >>