[julia-users] Re: DataFrame : aggregate with only on column possible ?

Cedric St-Jean Sat, 28 May 2016 04:14:12 -0700

Does 

by(df, :a) do subdf
   subdf[indmax(subdf[:sum_max]), :]
end


work? I would suggest reading about the "Split-apply-combine" strategy, 
it's useful vocabulary for talking about these things, although some of it 
was quite confusing to me at first...

On Saturday, May 28, 2016 at 3:25:31 AM UTC-4, Fred wrote:
>
>
> Thank you Cedric !
>
> To clarify I give you an example :
>
> │ Row │ a │ b │ c    │ sum │
> ┝━━━━━┿━━━┿━━━┿━━━━━━┿━━━━━┥
> │ 1   │ X │ 2 │ 10   │12   │
> │ 2   │ Y │ 1 │ 3    │ 4   │
> │ 3   │ Z │ 2 │ 5    │ 7   │
> │ 4   │ X │ 1 │ 20   │ 21  │
> │ 5   │ X │ 2 │ 5    │ 7   │
> │ 6   │ Z │ 1 │ 8    │ 9   │
>
> I want to obtain :
>
> │ Row │ a │ b │ c  │ sum_max│
> ┝━━━━━┿━━━┿━━━┿━━━━┿━━━━━━━━┥
> │ 1   │ X │ 1 │20  │21      │
> │ 2   │ Y │ 1 │ 3  │ 4      │
> │ 3   │ Z │ 1 │ 8  │ 9      │
>
>
>
>
>
>  
> you can see that the lines are unchanged but filtered to obtain the sum 
> maximum. In particular the column b contains only "1". 
> with aggregate(df2, :a,  maximum) it is not the case because I would also 
> obtain the maximum of b (2,1,2) and c. When I have duplicates in column a 
> (X,X,X), for example :
>
>
> │ Row │ a │ b │ c   │sum_max│
> ┝━━━━━┿━━━┿━━━┿━━━━━┿━━━━━━━┥
> │ 1   │ X │ 2 │ 10  │  12   │
> │ 4   │ X │ 1 │ 20  │  21   │
> │ 5   │ X │ 2 │ 5   │  7    │
>
>
>
>
> I want to remove the rows  1 and 5 because their sum is lower than row 4
> . So the result is :
>
> │ Row │ a │ b │ c  │ sum_max│
> ┝━━━━━┿━━━┿━━━┿━━━━┿━━━━━━━━┥
> │ 4   │ X │ 1 │ 20 │     21 │
>
>
>
> I don't wan't b_max and c_max, only sum_max. I hope my explanation is now 
> more clear :)
>
>
>
>

[julia-users] Re: DataFrame : aggregate with only on column possible ?

Reply via email to