Thanks for all the help!
On Wednesday, May 4, 2016 at 1:32:19 PM UTC-4, Sam Urmy wrote:
>
> df = @linq df |>
> groupby(:PetalLength) |>
> transform(cs = cumsum(:PetalLength))
>
> You can also use the @linq macro to pipe the output from one operation to
> the next, which often reads more clearly than nesting the function calls
> and is a little closer to the Pandas syntax.
>
> On Wednesday, May 4, 2016 at 9:23:26 AM UTC-4, Cedric St-Jean wrote:
>>
>> That's way better, thank you!
>>
>> I never thought I'd say this, but I miss pandas. I could write
>>
>> df['cs'] = df.groupby('PetalLength').transform(cumsum)
>>
>> That's not possible in Julia because DataFrames don't have a row index.
>>
>> On Wednesday, May 4, 2016 at 9:04:21 AM UTC-4, tshort wrote:
>>>
>>> Here's another way with DataFramesMeta [1]:
>>>
>>> using DataFrames, DataFramesMeta, RDatasets
>>> df = dataset("datasets", "iris")@transform(groupby(df, :Species), cs =
>>> cumsum(:PetalLength))
>>>
>>>
>>>
>>> [1] https://github.com/JuliaStats/DataFramesMeta.jl/
>>>
>>>
>>>
>>> On Wed, May 4, 2016 at 8:09 AM, Cedric St-Jean <[email protected]>
>>> wrote:
>>>
>>>> "Do blocks" are one of my favourite things about Julia, they're
>>>> explained in the docs
>>>> <http://docs.julialang.org/en/release-0.4/manual/functions/#do-block-syntax-for-function-arguments>.
>>>>
>>>> Basically it's just a convenient way of defining and passing a function
>>>> (the code that comes after `do`) to another function (in this case, `by`).
>>>> `by` goes over the dataframe, splits it into 3 subdataframes (one for each
>>>> Species in the iris dataset), and calls the do-block for each of them.
>>>> Then
>>>> their return values (the last line in the do-block) gets concatenated
>>>> together to form the final result. The code I really wanted to write is:
>>>>
>>>> using RDatasets
>>>> df = dataset("datasets", "iris")
>>>> # For each species
>>>> df2 = by(df, :Species) do sub_df
>>>> sub_df = copy(sub_df) # don't modify the original dataframe
>>>> # Add a :cumulative_PetalLength column
>>>> sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
>>>> # Return the new sub-dataframe
>>>> sub_df
>>>> end
>>>>
>>>> but unfortunately, this code doesn't work with DataFrames.jl
>>>>
>>>>
>>>> On Wednesday, May 4, 2016 at 4:42:41 AM UTC-4, Ben Southwood wrote:
>>>>>
>>>>> Thanks Cedric, that worked very well. I'm having a little trouble
>>>>> following the documentation as to how the "by ... do ..." structure
>>>>> actually works. Would you mind explaining what the code is doing?
>>>>>
>>>>> On Tuesday, May 3, 2016 at 10:07:10 PM UTC-4, Cedric St-Jean wrote:
>>>>>>
>>>>>> Something like
>>>>>>
>>>>>> using RDatasets
>>>>>> df = dataset("datasets", "iris")
>>>>>> df[:cumulative_PetalLength] = 0.0
>>>>>> by(df, :Species) do sub_df
>>>>>> sub_df[:cumulative_PetalLength] = cumsum(sub_df[:PetalLength])
>>>>>> sub_df
>>>>>> end
>>>>>>
>>>>>> though I hope someone can provide a more elegant solution. `sub_df` a
>>>>>> SubDataFrame, and those objects can neither have a new column nor be
>>>>>> converted to DataFrame.
>>>>>>
>>>>>> On Tuesday, May 3, 2016 at 4:22:29 PM UTC-4, Ben Southwood wrote:
>>>>>>>
>>>>>>> I have the following dataframe with values of the form
>>>>>>>
>>>>>>> date1,label1,qty1_1
>>>>>>> date2,label1,qty1_2
>>>>>>> date3,label1,qty1_3
>>>>>>> ....
>>>>>>> dateN,label1,qty1_N
>>>>>>> date1,label2,qty2_1
>>>>>>> date2,label2,qty2_2
>>>>>>> date3,label2,qty2_3
>>>>>>> ....
>>>>>>> dateN,label2,qty1_N
>>>>>>> ....
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I would like to cumulative sum the qtys such that the value of the
>>>>>>> cumulative sum only increases for each label. And then i'd have
>>>>>>>
>>>>>>> date1,label1,cuml1_1
>>>>>>> date2,label1,cuml1_2
>>>>>>> date3,label1,cuml1_3
>>>>>>> ....
>>>>>>> dateN,label1,cuml1_N
>>>>>>> date1,label2,cuml2_1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This way I can use gadfly and run the following plot
>>>>>>>
>>>>>>>
>>>>>>> plot(x=grouped[:date],y=grouped[:cuml_sum],color=grouped[:label],Geom.line)
>>>>>>>
>>>>>>>
>>>>>>> and have each cuml sum have it's own colouring by date. I'm stuck
>>>>>>> on how to do this simply without creating lookups. Any help? Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>