Hiya-

I use both Pandas and Numpy in my work, but have definitely shifted towards
Pandas for various reasons (mostly my code being readable to novice
labmates). I've taught both and prefer to use Pandas. Keep in mind that I'm
a biologist, mostly working with biologists. Here are my reasons:

1) The commands resemble human language more. Since most learners I work
with are novice-ish, this is pretty important. But the intermediate
students also seemed to like it, and particularly some of the data
formatting capabilities.
2) The library is more well-documented at a novish level, with good 'next
steps' on speed-up and Numpy interoperability.
3) The data, as displayed to the screen, is more human-readable. It's
easier to, say, open your data in a spreadsheet and compare to what's in
the spreadhseet as you get the hang of indexing.
4) Easy liberation of data from Excel, and I've found it easier to work
with data that have been modified in ways that are more 'human readable' or
have weird annotations in the first few rows, which often cause ragged
numbers of columns.
5) In biology, our main competitor is R. And Pandas is behaves in a way
that is more R-like, and easier for folks with that language background to
make the jump.
6) Relatedly, the ggplots library is ported to Python and is being designed
with Pandas interoperability in mind. I really like ggplot, and just just
submitted my second paper with pure Python graphics using this library. The
library has some name recognition, particularly in social sciences and
biology.

But this is definitely a 'know your community' thing. If Numpy is the norm
in whatever field, use it. If these are people coming from C or something,
they might prefer Numpy. If they're coming from R, Pandas is more likely
the way to go.

--a
---------
Graduate Student
Section of Integrative Biology
University of Texas at Austin
1 University Station, C1100
Austin, TX 78712-0254
Phone: 512.940.5761

On Thu, Sep 25, 2014 at 12:24 PM, Azalee Bostroem <[email protected]>
wrote:

> Hi Everyone,
>
> Way back in March when I taught at the Berkeley WISE bootcamp I modified
> the intermediate python lesson to use Numpy rather than Pandas. My
> reasoning was the following:
>
>
>    1. I think that there is a bit more overhead in understanding a Pandas
>    Table class (vs a Numpy array)
>    2. The slicing is a little more straightforward
>    3. Students are likely to use Numpy arrays and I can read data
>    straight into those arrays
>    4. I think Numpy is more prevalent
>
>
> The downsides:
>
>    1. The Numpy functions to read data can require a lot of massaging to
>    get your data read in correctly. Not an issue at the bootcamp (sorry,
>    workshop) but an issue for future usability for students.
>    2. There isn’t a way to display all of the data as a table (without
>    invoking shell commands). To be clear, I use the unpack keyword so that
>    each column is put into a separate array
>
>
> I’m curious to hear the perspective of others whose use both Numpy and/or
> Pandas and those who have taught using either. The audience this is aimed
> at is one that has programmed before, but not in python.
>
> Thanks,
> Azalee
>
> _______________________________________________
> Discuss mailing list
> [email protected]
>
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
>
_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Reply via email to