Does by(df, :a) do subdf subdf[indmax(subdf[:sum_max]), :] end
work? I would suggest reading about the "Split-apply-combine" strategy, it's useful vocabulary for talking about these things, although some of it was quite confusing to me at first... On Saturday, May 28, 2016 at 3:25:31 AM UTC-4, Fred wrote: > > > Thank you Cedric ! > > To clarify I give you an example : > > │ Row │ a │ b │ c │ sum │ > ┝━━━━━┿━━━┿━━━┿━━━━━━┿━━━━━┥ > │ 1 │ X │ 2 │ 10 │12 │ > │ 2 │ Y │ 1 │ 3 │ 4 │ > │ 3 │ Z │ 2 │ 5 │ 7 │ > │ 4 │ X │ 1 │ 20 │ 21 │ > │ 5 │ X │ 2 │ 5 │ 7 │ > │ 6 │ Z │ 1 │ 8 │ 9 │ > > I want to obtain : > > │ Row │ a │ b │ c │ sum_max│ > ┝━━━━━┿━━━┿━━━┿━━━━┿━━━━━━━━┥ > │ 1 │ X │ 1 │20 │21 │ > │ 2 │ Y │ 1 │ 3 │ 4 │ > │ 3 │ Z │ 1 │ 8 │ 9 │ > > > > > > > you can see that the lines are unchanged but filtered to obtain the sum > maximum. In particular the column b contains only "1". > with aggregate(df2, :a, maximum) it is not the case because I would also > obtain the maximum of b (2,1,2) and c. When I have duplicates in column a > (X,X,X), for example : > > > │ Row │ a │ b │ c │sum_max│ > ┝━━━━━┿━━━┿━━━┿━━━━━┿━━━━━━━┥ > │ 1 │ X │ 2 │ 10 │ 12 │ > │ 4 │ X │ 1 │ 20 │ 21 │ > │ 5 │ X │ 2 │ 5 │ 7 │ > > > > > I want to remove the rows 1 and 5 because their sum is lower than row 4 > . So the result is : > > │ Row │ a │ b │ c │ sum_max│ > ┝━━━━━┿━━━┿━━━┿━━━━┿━━━━━━━━┥ > │ 4 │ X │ 1 │ 20 │ 21 │ > > > > I don't wan't b_max and c_max, only sum_max. I hope my explanation is now > more clear :) > > > >
