Something like 

using RDatasets
df = dataset("datasets", "iris")
df[:cumulative_PetalLength] = 0.0
by(df, :Species) do sub_df
    sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
    sub_df
end

though I hope someone can provide a more elegant solution. `sub_df` a 
SubDataFrame, and those objects can neither have a new column nor be 
converted to DataFrame.

On Tuesday, May 3, 2016 at 4:22:29 PM UTC-4, Ben Southwood wrote:
>
> I have the following dataframe with values of the form
>
> date1,label1,qty1_1
> date2,label1,qty1_2
> date3,label1,qty1_3
> ....
> dateN,label1,qty1_N
> date1,label2,qty2_1
> date2,label2,qty2_2
> date3,label2,qty2_3
> ....
> dateN,label2,qty1_N
> ....
>
>
>
> I would like to cumulative sum the qtys such that the value of the 
> cumulative sum only increases for each label. And then i'd have
>
> date1,label1,cuml1_1
> date2,label1,cuml1_2
> date3,label1,cuml1_3
> ....
> dateN,label1,cuml1_N
> date1,label2,cuml2_1
>
>
>
> This way I can use gadfly and run the following plot
>
> plot(x=grouped[:date],y=grouped[:cuml_sum],color=grouped[:label],Geom.line)
>
>
> and have each cuml sum have it's own colouring by date.  I'm stuck on how 
> to do this simply without creating lookups. Any help? Thanks!
>
>
>

Reply via email to