Hiya- I use both Pandas and Numpy in my work, but have definitely shifted towards Pandas for various reasons (mostly my code being readable to novice labmates). I've taught both and prefer to use Pandas. Keep in mind that I'm a biologist, mostly working with biologists. Here are my reasons:
1) The commands resemble human language more. Since most learners I work with are novice-ish, this is pretty important. But the intermediate students also seemed to like it, and particularly some of the data formatting capabilities. 2) The library is more well-documented at a novish level, with good 'next steps' on speed-up and Numpy interoperability. 3) The data, as displayed to the screen, is more human-readable. It's easier to, say, open your data in a spreadsheet and compare to what's in the spreadhseet as you get the hang of indexing. 4) Easy liberation of data from Excel, and I've found it easier to work with data that have been modified in ways that are more 'human readable' or have weird annotations in the first few rows, which often cause ragged numbers of columns. 5) In biology, our main competitor is R. And Pandas is behaves in a way that is more R-like, and easier for folks with that language background to make the jump. 6) Relatedly, the ggplots library is ported to Python and is being designed with Pandas interoperability in mind. I really like ggplot, and just just submitted my second paper with pure Python graphics using this library. The library has some name recognition, particularly in social sciences and biology. But this is definitely a 'know your community' thing. If Numpy is the norm in whatever field, use it. If these are people coming from C or something, they might prefer Numpy. If they're coming from R, Pandas is more likely the way to go. --a --------- Graduate Student Section of Integrative Biology University of Texas at Austin 1 University Station, C1100 Austin, TX 78712-0254 Phone: 512.940.5761 On Thu, Sep 25, 2014 at 12:24 PM, Azalee Bostroem <[email protected]> wrote: > Hi Everyone, > > Way back in March when I taught at the Berkeley WISE bootcamp I modified > the intermediate python lesson to use Numpy rather than Pandas. My > reasoning was the following: > > > 1. I think that there is a bit more overhead in understanding a Pandas > Table class (vs a Numpy array) > 2. The slicing is a little more straightforward > 3. Students are likely to use Numpy arrays and I can read data > straight into those arrays > 4. I think Numpy is more prevalent > > > The downsides: > > 1. The Numpy functions to read data can require a lot of massaging to > get your data read in correctly. Not an issue at the bootcamp (sorry, > workshop) but an issue for future usability for students. > 2. There isn’t a way to display all of the data as a table (without > invoking shell commands). To be clear, I use the unpack keyword so that > each column is put into a separate array > > > I’m curious to hear the perspective of others whose use both Numpy and/or > Pandas and those who have taught using either. The audience this is aimed > at is one that has programmed before, but not in python. > > Thanks, > Azalee > > _______________________________________________ > Discuss mailing list > [email protected] > > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org >
_______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
