Re: [julia-users] Re: Newbie question. Need help with grouping dataframes, cumulative sums and plotting.

Cedric St-Jean Wed, 04 May 2016 06:24:01 -0700

That's way better, thank you!

I never thought I'd say this, but I miss pandas. I could write


df['cs'] = df.groupby('PetalLength').transform(cumsum)

That's not possible in Julia because DataFrames don't have a row index.

On Wednesday, May 4, 2016 at 9:04:21 AM UTC-4, tshort wrote:
>
> Here's another way with DataFramesMeta [1]:
>
> using DataFrames, DataFramesMeta, RDatasets
> df = dataset("datasets", "iris")@transform(groupby(df, :Species), cs = 
> cumsum(:PetalLength))
>
> 
>
> [1] https://github.com/JuliaStats/DataFramesMeta.jl/
>
>
>
> On Wed, May 4, 2016 at 8:09 AM, Cedric St-Jean <[email protected] 
> <javascript:>> wrote:
>
>> "Do blocks" are one of my favourite things about Julia, they're explained in 
>> the docs 
>> <http://docs.julialang.org/en/release-0.4/manual/functions/#do-block-syntax-for-function-arguments>.
>>  
>> Basically it's just a convenient way of defining and passing a function 
>> (the code that comes after `do`) to another function (in this case, `by`). 
>> `by` goes over the dataframe, splits it into 3 subdataframes (one for each 
>> Species in the iris dataset), and calls the do-block for each of them. Then 
>> their return values (the last line in the do-block) gets concatenated 
>> together to form the final result. The code I really wanted to write is:
>>
>> using RDatasets
>> df = dataset("datasets", "iris")
>> # For each species
>> df2 = by(df, :Species) do sub_df
>>    sub_df = copy(sub_df)   # don't modify the original dataframe
>>    # Add a :cumulative_PetalLength column
>>    sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
>>    # Return the new sub-dataframe
>>   sub_df
>> end
>>
>> but unfortunately, this code doesn't work with DataFrames.jl
>>
>>
>> On Wednesday, May 4, 2016 at 4:42:41 AM UTC-4, Ben Southwood wrote:
>>>
>>> Thanks Cedric, that worked very well.  I'm having a little trouble 
>>> following the documentation as to how the "by ... do ..." structure 
>>> actually works.  Would you mind explaining what the code is doing?
>>>
>>> On Tuesday, May 3, 2016 at 10:07:10 PM UTC-4, Cedric St-Jean wrote:
>>>>
>>>> Something like 
>>>>
>>>> using RDatasets
>>>> df = dataset("datasets", "iris")
>>>> df[:cumulative_PetalLength] = 0.0
>>>> by(df, :Species) do sub_df
>>>>     sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
>>>>     sub_df
>>>> end
>>>>
>>>> though I hope someone can provide a more elegant solution. `sub_df` a 
>>>> SubDataFrame, and those objects can neither have a new column nor be 
>>>> converted to DataFrame.
>>>>
>>>> On Tuesday, May 3, 2016 at 4:22:29 PM UTC-4, Ben Southwood wrote:
>>>>>
>>>>> I have the following dataframe with values of the form
>>>>>
>>>>> date1,label1,qty1_1
>>>>> date2,label1,qty1_2
>>>>> date3,label1,qty1_3
>>>>> ....
>>>>> dateN,label1,qty1_N
>>>>> date1,label2,qty2_1
>>>>> date2,label2,qty2_2
>>>>> date3,label2,qty2_3
>>>>> ....
>>>>> dateN,label2,qty1_N
>>>>> ....
>>>>>
>>>>>
>>>>>
>>>>> I would like to cumulative sum the qtys such that the value of the 
>>>>> cumulative sum only increases for each label. And then i'd have
>>>>>
>>>>> date1,label1,cuml1_1
>>>>> date2,label1,cuml1_2
>>>>> date3,label1,cuml1_3
>>>>> ....
>>>>> dateN,label1,cuml1_N
>>>>> date1,label2,cuml2_1
>>>>>
>>>>>
>>>>>
>>>>> This way I can use gadfly and run the following plot
>>>>>
>>>>>
>>>>> plot(x=grouped[:date],y=grouped[:cuml_sum],color=grouped[:label],Geom.line)
>>>>>
>>>>>
>>>>> and have each cuml sum have it's own colouring by date.  I'm stuck on 
>>>>> how to do this simply without creating lookups. Any help? Thanks!
>>>>>
>>>>>
>>>>>
>

Re: [julia-users] Re: Newbie question. Need help with grouping dataframes, cumulative sums and plotting.

Reply via email to