Hi, Regarding Greg´s post about how to organize plotting code for data. This is a common issue encountered regardless of who collected the data or how the data was generated. I´m an experimental neuroscientist in that I collect the data that I then analyze to test hypotheses and models. After some thrashing about I´ve kind of settled with the following design (parts in common with Greg)
1. Analysis and plotting are separate. Analysis often takes a lot of CPU time, whereas plotting doesn´t. A given analysis can be plotted in many different ways and often I want to tweak plots. I don´t want to recompute the data each time. So a pragmatic way is to save the analysed data as a pickle file and have the plotting code load it. 2. Analysis code is written to be run non-interactively using the command line options package to pass parameters/instructions. Useful when I want to run the code on remote machines, or parallelize the code. 3. No GUIs. This has saved me so much time. I just write plotting code that pops up (or saves as pdf) one figure according to command line options. If I need a new type of figure I just copy the code into a new script/module and save it separately. This is much easier to debug than interactive GUIs that do a gazillion things. 4. Source control. Don´t delete any code, save it under different folders organized by idea or by date. I've always found myself asking, months later, I made a plot like this, where is it, I want to see what I did there. That's the current credo that has helped me waste a little less time when I want to test an idea with my data. Best -Kaushik ------------------------------ Message: 7 Date: Mon, 21 Dec 2009 17:42:40 -0500 From: Greg Novak <no...@ucolick.org> Subject: [Matplotlib-users] Best practices for organizing plotting code? To: matplotlib-users@lists.sourceforge.net Message-ID: <ad0d4fcf0912211442x1261b84ar79945c045a1af...@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hello, I do computational science and I think I'm typical in that I've accumulated a huge pile of code to post-process simulations and draw plots. I think the number of lines of plotting code is now greater than the number of lines in the actual simulation code... The problem with plotting code is that so much of it has such a short lifetime---you have an idea, spend some time writing code to draw the relevant plot, then the plot isn't interesting and you delete the code. Therefore there's little incentive to spend any time making sure that plotting code is at all well-designed. Nevertheless, _some_ of it tends to live a long time and get ever more complicated---then the lack of design becomes ever more painful as time goes on. You simply don't know at the beginning which code will be thrown away and which will live a long time. Over the years I've developed my favorite way to organize my plotting code but it's far from perfect and I'd love to gather ideas from the MPL community. So, my current "design principles" are basically these: 1) Don't over-design. A simple system that's used consistently is better than a half-implemented complicated system. Furthermore, most plotting code gets thrown away, so keeping overhead down is one of the primary considerations. 2) Keep computation separate from plotting wherever possible. Therefore I have functions like "def compute_optical_depth(...)" that compute the physical quantities to be plotted and "def plot_optical_depth(...)" that handle everything about the visual appearance of the plot. Then when I want to draw some other plot involving optical depth, the calculation is neatly packaged into a function. 3) Keep annotation, axis labels, legends, etc, separate from the code that actually draws the lines on the axes. This allows you to compose plots to a certain extent. I often find myself saying "I want plot B to look just like plot A but with this extra information, extra lines, extra annotation, or whatever" If the function that draws plot A just puts the data on the axes without axis labels, etc, then the function that draws plot B can easily use it directly. If the function that draws plot A _also_ draws a bunch of annotations and labels, then the function that draws plot B must either get rid of them or hope they still make sense in the new context. 4) Don't put clf() and cla() all over the place. When working interactively, it's very tempting to put clf()'s into every function that draws a plot in order to save a few keystrokes. However, plots don't know the context into which they're being drawn, therefore they have no authority to clear the screen. They may "own" the whole plotting window, or they may be incorporated into a larger context. The function that worries about axis labels, annotations, and titles is allowed to call cla(). The function that worries about subplots is allowed to call clf(). If you might use the code over a slow link (e.g. connecting to a supercomputing site via residential DSL) then no function should call draw() -- that's the user's job. The upshot of these is that I end up with four layers of functions: 1) compute_physical_quantity(...): just handles numbers 2) draw_physical_quantity(...): has calls to pylab.plot() handling colors, linestyles, etc, but not annotations 3) some_plot(...): has calls to draw_physical_quantity(), some_related_physical_quantity(), along with axis labels, annotations, legends, and pylab.cla() 4) some_figure(...): has multiple panels with calls to pylab.subplot(), pylab.clf(), some_plot_a(), some_plot_b(), etc. Sometimes layers 2 and 3 are combined because I'm lazy if layer 2 would really be just a single call to pylab.plot. Please remember that I'm not writing these down because I think they're so great that everyone needs to know about them. I'm hoping that people will respond with much better ideas that I can adopt for myself. Thanks, Greg ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Matplotlib-users mailing list Matplotlib-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-users