Thanks Cedric, that worked very well.  I'm having a little trouble 
following the documentation as to how the "by ... do ..." structure 
actually works.  Would you mind explaining what the code is doing?

On Tuesday, May 3, 2016 at 10:07:10 PM UTC-4, Cedric St-Jean wrote:
>
> Something like 
>
> using RDatasets
> df = dataset("datasets", "iris")
> df[:cumulative_PetalLength] = 0.0
> by(df, :Species) do sub_df
>     sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
>     sub_df
> end
>
> though I hope someone can provide a more elegant solution. `sub_df` a 
> SubDataFrame, and those objects can neither have a new column nor be 
> converted to DataFrame.
>
> On Tuesday, May 3, 2016 at 4:22:29 PM UTC-4, Ben Southwood wrote:
>>
>> I have the following dataframe with values of the form
>>
>> date1,label1,qty1_1
>> date2,label1,qty1_2
>> date3,label1,qty1_3
>> ....
>> dateN,label1,qty1_N
>> date1,label2,qty2_1
>> date2,label2,qty2_2
>> date3,label2,qty2_3
>> ....
>> dateN,label2,qty1_N
>> ....
>>
>>
>>
>> I would like to cumulative sum the qtys such that the value of the 
>> cumulative sum only increases for each label. And then i'd have
>>
>> date1,label1,cuml1_1
>> date2,label1,cuml1_2
>> date3,label1,cuml1_3
>> ....
>> dateN,label1,cuml1_N
>> date1,label2,cuml2_1
>>
>>
>>
>> This way I can use gadfly and run the following plot
>>
>>
>> plot(x=grouped[:date],y=grouped[:cuml_sum],color=grouped[:label],Geom.line)
>>
>>
>> and have each cuml sum have it's own colouring by date.  I'm stuck on how 
>> to do this simply without creating lookups. Any help? Thanks!
>>
>>
>>

Reply via email to