Pandas example of groupby

Laeeth Isharc via Digitalmars-d Mon, 26 Jan 2015 12:00:39 -0800

If "group by" in other languages refers to the latter function,thenthat means "groupBy" is poorly-named and we need to come upwith abetter name for it. Changing it to return tuples and what-notseems to
be beating around the bush to me.


T

T: you are good with algorithms. In many applications you have abunch of results and want to summarise them. This is often whatthe corporate manager is doing with Excel pivot tables, and it iswhat the groupby function is used for in pandas. See here for asimple tutorial.


http://wesmckinney.com/blog/?p=125

And here for a summary of what pandas can do with data:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.median.html

Is there any reason why we shouldn't add to Phobos: median,ranking, stddev, variance, correlation, covariance, skew,kurtosis, quantile, moving average, exp mov average, rollingwindow (see pandas)?

I personally am fine with the implementation we have (although asRay Dalio would say. I haven't yet earned the right that youshould care what I think). All that it means is that you need tosort on multi key your results first before passing to groupby.

My question is how much is lost by doing it in two steps (sort,groupby) rather than one. I don't think all that much, but it isnot my field, I am also not that bothered, because this comes atthe end of processing, not within the inner loop, so for me Idon't think it makes a difference for now. If data sets reachcommoncrawl type sizes then it might be different, although Iwill take D over java any day, warts and all.

In any case, the documentation should be very clear on whatgroupby does, and how the user can do what he might be expectingto achieve, coming from a different framework.

It would be interesting to benchmark D against pandas (which isimplemented in cython for the key bits) and see how we look.

Pandas example of groupby

Reply via email to