[julia-users] DataFrame : aggregate with only on column possible ?

Fred Fri, 27 May 2016 10:57:07 -0700

Hi,

I have a dataframe df2 and the last column is the sum = b+c :


julia> df2
8x4 DataFrames.DataFrame
│ Row │ a │ b │ c         │ sum       │
┝━━━━━┿━━━┿━━━┿━━━━━━━━━━━┿━━━━━━━━━━━┥
│ 1   │ 1 │ 2 │ -0.163564 │ 1.83644   │
│ 2   │ 2 │ 1 │ 0.731038  │ 1.73104   │
│ 3   │ 3 │ 2 │ 0.0951149 │ 2.09511   │
│ 4   │ 4 │ 1 │ 0.195321  │ 1.19532   │
│ 5   │ 1 │ 2 │ 1.97058   │ 3.97058   │
│ 6   │ 2 │ 1 │ 0.150826  │ 1.15083   │
│ 7   │ 3 │ 2 │ 0.422046  │ 2.42205   │
│ 8   │ 4 │ 1 │ -1.36549  │ -0.365486 │

we can se that the column a have duplicates (1,2,3). I try to find a simple 
way to remove the duplicates that have the lowest sum without changing the 
values of b and c.

I tried :

julia> aggregate(df2, :a,  maximum)


4x4 DataFrames.DataFrame
│ Row │ a │ b_maximum │ c_maximum │ sum_maximum │
┝━━━━━┿━━━┿━━━━━━━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━━━━┥
│ 1   │ 1 │ 2         │ 1.97058   │ 3.97058     │
│ 2   │ 2 │ 1         │ 0.731038  │ 1.73104     │
│ 3   │ 3 │ 2         │ 0.422046  │ 2.42205     │
│ 4   │ 4 │ 1         │ 0.195321  │ 1.19532     │



but this is wrong because I don't want b_maximum and c_maximum but only 
sum_maximum :


Row │ a │ b │ c │ sum_maximum


I don't think that there is a simple way to do that but I ask the question 
in case ;)

Thank  you very much in advance !

[julia-users] DataFrame : aggregate with only on column possible ?

Reply via email to