Re: [julia-users] fastest way to sum columns in a dataframe

Jason Solack Sun, 02 Mar 2014 09:45:06 -0800

so i'm guessing i'm doing something wrong, this is much slower... i've 
simplified things in my code a little to maybe help see what is going on


  for c1 = levels[1], c2 = levels[2], c3 = levels[3], c4 = levels[4], c5 = 
levels[5], c6 = levels[6], c7 = levels[7], c8 = levels[8], c9 = levels[9], 
c10 = levels[10], c11 = levels[11], c12 = levels[12], c13 = levels[13]
        for j = 1:rows
            s[j] = df[j, c1]
            s[j] += df[j, c2]
            s[j] += df[j, c3]
            s[j] += df[j, c4]
            s[j] += df[j, c5]
            s[j] += df[j, c6]
            s[j] += df[j, c7]
            s[j] += df[j, c8]
            s[j] += df[j, c9]
            s[j] += df[j, c10]
            s[j] += df[j, c11]
            s[j] += df[j, c12]
            s[j] += df[j, c13]
        end
    end


On Sunday, March 2, 2014 12:17:55 PM UTC-5, Jason Solack wrote:
>
> i will give this a shot.  thank you for the reply and all your work on 
> Julia/DataFrames.  It's much appreciated!
>
> On Sunday, March 2, 2014 12:13:22 PM UTC-5, John Myles White wrote:
>>
>> I’m a little fuzzy still, but I think the answer is probably still that 
>> the problem you’re hitting is the indexing into the DataFrame isn’t 
>> sufficient to let the compiler know that the return type of the index is 
>> always a Float64. So you’ll want to try some of the tricks described in the 
>> thead I linked to, which reduce to doing things like the following:
>>
>> da = @data([1.0, 2.0, 3.0])
>> s = 0.0
>> for i in 1:3
>> s += da.data[i]
>> end
>>
>>  — John 
>>
>> On Mar 2, 2014, at 9:08 AM, Jason Solack <jays...@gmail.com> wrote:
>>
>> the DataFrame contains floats and i'd ultimately like to have an array of 
>> size nrow(data) with the sum of those 13 columns in it (the column 
>> combination changes with each iteration).
>>
>> Is that enough detail?
>>
>> I've done the entire algorithm in c++ and at this point julia is a bit 
>> slower, but i have a feeling parts of my Julia code are just not efficient 
>> and therefore could be faster!
>>
>> On Sunday, March 2, 2014 12:03:48 PM UTC-5, John Myles White wrote:
>>>
>>> Hi Jason, 
>>>
>>> Can you give a few more details about what objects are? What is data? 
>>> What is levels? 
>>>
>>> In general, the performance problems with DataFrames are actually 
>>> performance issues with DataArrays not letting type inference work well. We 
>>> still haven’t agreed on the right solution, but this thread lays out the 
>>> relevant issues: https://github.com/JuliaStats/DataArrays.jl/issues/71 
>>>
>>>  — John 
>>>
>>> On Mar 2, 2014, at 9:01 AM, Jason Solack <jays...@gmail.com> wrote: 
>>>
>>> > Hello everyone, 
>>> > 
>>> > i am doing several millions of iterations over a dataframe and i need 
>>> to perform several computations over various combinations of columns.  The 
>>> first of which is a simple sum of 13 column, this appears to be a slow 
>>> point of execution. 
>>> > 
>>> > right now i'm doing something like this: 
>>> > 
>>> >     for c1 = levels[1], c2 = levels[2], c3 = levels[3], c4 = 
>>> levels[4], c5 = levels[5], c6 = levels[6], c7 = levels[7], c8 = levels[8], 
>>> c9 = levels[9], c10 = levels[10], c11 = levels[11], c12 = levels[12], c13 = 
>>> levels[13] 
>>> >     
>>> >         s = data[levelStartColums[1]+(c1-1)] .+ 
>>> >             data[levelStartColums[2]+(c2-1)] .+ 
>>> >             data[levelStartColums[3]+(c3-1)] .+ 
>>> >             data[levelStartColums[4]+(c4-1)] .+ 
>>> >             data[levelStartColums[5]+(c5-1)] .+ 
>>> >             data[levelStartColums[6]+(c6-1)] .+ 
>>> >             data[levelStartColums[7]+(c7-1)] .+ 
>>> >             data[levelStartColums[8]+(c8-1)] .+ 
>>> >             data[levelStartColums[9]+(c9-1)] .+ 
>>> >             data[levelStartColums[10]+(c10-1)] .+ 
>>> >             data[levelStartColums[11]+(c11-1)] .+ 
>>> >             data[levelStartColums[12]+(c12-1)] .+ 
>>> >             data[levelStartColums[13]+(c13-1)] 
>>> >     end 
>>> > 
>>> > does anyone have any tips on how to write this so it will execute 
>>> faster? 
>>> > 
>>> > Thank you in advance! 
>>> > 
>>> > Jason 
>>>
>>>
>>

Re: [julia-users] fastest way to sum columns in a dataframe

Reply via email to