Following on April's points: I think it depends a bit on what you are trying to teach. There are applications for which Pandas is critically useful, and there are applications for which it is useless (like if you have an actual 2D array).
I just took a quick look at the intermediate Python lessons and in their current form I think they would work fine either way. The one example that is very Pandas is the usage of a patsy formula with the statsmodels regression. I think that'd look a bit different if using NumPy arrays. Pandas is becoming so mature and popular now that I think there are whole swaths of new Python users that have no exposure to raw NumPy. But at the same time there are plenty of people who just don't need Pandas and for whom teaching learning Pandas is not going to be helpful when they go back to work. Might this be a situation where it's worth maintaining parallel sets of lesson material with and without Pandas? On Thu, Sep 25, 2014 at 11:19 AM, April Wright <[email protected]> wrote: > Hiya- > > I use both Pandas and Numpy in my work, but have definitely shifted > towards Pandas for various reasons (mostly my code being readable to novice > labmates). I've taught both and prefer to use Pandas. Keep in mind that I'm > a biologist, mostly working with biologists. Here are my reasons: > > 1) The commands resemble human language more. Since most learners I work > with are novice-ish, this is pretty important. But the intermediate > students also seemed to like it, and particularly some of the data > formatting capabilities. > 2) The library is more well-documented at a novish level, with good 'next > steps' on speed-up and Numpy interoperability. > 3) The data, as displayed to the screen, is more human-readable. It's > easier to, say, open your data in a spreadsheet and compare to what's in > the spreadhseet as you get the hang of indexing. > 4) Easy liberation of data from Excel, and I've found it easier to work > with data that have been modified in ways that are more 'human readable' or > have weird annotations in the first few rows, which often cause ragged > numbers of columns. > 5) In biology, our main competitor is R. And Pandas is behaves in a way > that is more R-like, and easier for folks with that language background to > make the jump. > 6) Relatedly, the ggplots library is ported to Python and is being > designed with Pandas interoperability in mind. I really like ggplot, and > just just submitted my second paper with pure Python graphics using this > library. The library has some name recognition, particularly in social > sciences and biology. > > But this is definitely a 'know your community' thing. If Numpy is the norm > in whatever field, use it. If these are people coming from C or something, > they might prefer Numpy. If they're coming from R, Pandas is more likely > the way to go. > > --a > --------- > Graduate Student > Section of Integrative Biology > University of Texas at Austin > 1 University Station, C1100 > Austin, TX 78712-0254 > Phone: 512.940.5761 > > On Thu, Sep 25, 2014 at 12:24 PM, Azalee Bostroem <[email protected]> > wrote: > >> Hi Everyone, >> >> Way back in March when I taught at the Berkeley WISE bootcamp I modified >> the intermediate python lesson to use Numpy rather than Pandas. My >> reasoning was the following: >> >> >> 1. I think that there is a bit more overhead in understanding a >> Pandas Table class (vs a Numpy array) >> 2. The slicing is a little more straightforward >> 3. Students are likely to use Numpy arrays and I can read data >> straight into those arrays >> 4. I think Numpy is more prevalent >> >> >> The downsides: >> >> 1. The Numpy functions to read data can require a lot of massaging to >> get your data read in correctly. Not an issue at the bootcamp (sorry, >> workshop) but an issue for future usability for students. >> 2. There isn’t a way to display all of the data as a table (without >> invoking shell commands). To be clear, I use the unpack keyword so that >> each column is put into a separate array >> >> >> I’m curious to hear the perspective of others whose use both Numpy and/or >> Pandas and those who have taught using either. The audience this is aimed >> at is one that has programmed before, but not in python. >> >> Thanks, >> Azalee >> >> _______________________________________________ >> Discuss mailing list >> [email protected] >> >> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org >> > > > _______________________________________________ > Discuss mailing list > [email protected] > > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org >
_______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
