Re: [julia-users] Re: Newbie question. Need help with grouping dataframes, cumulative sums and plotting.

Tom Short Wed, 04 May 2016 06:04:46 -0700

Here's another way with DataFramesMeta [1]:

using DataFrames, DataFramesMeta, RDatasets
df = dataset("datasets", "iris")@transform(groupby(df, :Species), cs =
cumsum(:PetalLength))




[1] https://github.com/JuliaStats/DataFramesMeta.jl/



On Wed, May 4, 2016 at 8:09 AM, Cedric St-Jean <cedric.stj...@gmail.com>
wrote:

> "Do blocks" are one of my favourite things about Julia, they're explained in
> the docs
> <http://docs.julialang.org/en/release-0.4/manual/functions/#do-block-syntax-for-function-arguments>.
> Basically it's just a convenient way of defining and passing a function
> (the code that comes after `do`) to another function (in this case, `by`).
> `by` goes over the dataframe, splits it into 3 subdataframes (one for each
> Species in the iris dataset), and calls the do-block for each of them. Then
> their return values (the last line in the do-block) gets concatenated
> together to form the final result. The code I really wanted to write is:
>
> using RDatasets
> df = dataset("datasets", "iris")
> # For each species
> df2 = by(df, :Species) do sub_df
>    sub_df = copy(sub_df)   # don't modify the original dataframe
>    # Add a :cumulative_PetalLength column
>    sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
>    # Return the new sub-dataframe
>   sub_df
> end
>
> but unfortunately, this code doesn't work with DataFrames.jl
>
>
> On Wednesday, May 4, 2016 at 4:42:41 AM UTC-4, Ben Southwood wrote:
>>
>> Thanks Cedric, that worked very well.  I'm having a little trouble
>> following the documentation as to how the "by ... do ..." structure
>> actually works.  Would you mind explaining what the code is doing?
>>
>> On Tuesday, May 3, 2016 at 10:07:10 PM UTC-4, Cedric St-Jean wrote:
>>>
>>> Something like
>>>
>>> using RDatasets
>>> df = dataset("datasets", "iris")
>>> df[:cumulative_PetalLength] = 0.0
>>> by(df, :Species) do sub_df
>>>     sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
>>>     sub_df
>>> end
>>>
>>> though I hope someone can provide a more elegant solution. `sub_df` a
>>> SubDataFrame, and those objects can neither have a new column nor be
>>> converted to DataFrame.
>>>
>>> On Tuesday, May 3, 2016 at 4:22:29 PM UTC-4, Ben Southwood wrote:
>>>>
>>>> I have the following dataframe with values of the form
>>>>
>>>> date1,label1,qty1_1
>>>> date2,label1,qty1_2
>>>> date3,label1,qty1_3
>>>> ....
>>>> dateN,label1,qty1_N
>>>> date1,label2,qty2_1
>>>> date2,label2,qty2_2
>>>> date3,label2,qty2_3
>>>> ....
>>>> dateN,label2,qty1_N
>>>> ....
>>>>
>>>>
>>>>
>>>> I would like to cumulative sum the qtys such that the value of the
>>>> cumulative sum only increases for each label. And then i'd have
>>>>
>>>> date1,label1,cuml1_1
>>>> date2,label1,cuml1_2
>>>> date3,label1,cuml1_3
>>>> ....
>>>> dateN,label1,cuml1_N
>>>> date1,label2,cuml2_1
>>>>
>>>>
>>>>
>>>> This way I can use gadfly and run the following plot
>>>>
>>>>
>>>> plot(x=grouped[:date],y=grouped[:cuml_sum],color=grouped[:label],Geom.line)
>>>>
>>>>
>>>> and have each cuml sum have it's own colouring by date.  I'm stuck on
>>>> how to do this simply without creating lookups. Any help? Thanks!
>>>>
>>>>
>>>>

Re: [julia-users] Re: Newbie question. Need help with grouping dataframes, cumulative sums and plotting.

Reply via email to