publication quality graphs

Stuart Prescott Wed, 12 Oct 2005 20:58:06 -0700

Hi all,

A question that has come up a few times on this list is how people go
about producing publication quality graphs. I'm revisiting this question
as I'm yet to find a method that actually works for me. Part of this is
that I am used to doing things in a particular way (which might have to
change!) and part of this is shortcomings in various packages that I've
tried. After a day or so of frustration with any given app, I end up
going back to Origin (Origin6 under WINE mostly works).


Here, I'd like to describe how I normally plot data and why the various
apps that I've tried don't work for me (below).

Hopefully, this will incite some discussion and regular users of these
apps can suggest some add-ons or simple changes to my workflow that will
turn them from being a pain in the backside to being a useful utility.
Or perhaps this will encourage someone (me?) to hack at these apps until
they become more useful to people like me.

All comments welcome!

cheers
Stuart



* current workflow using Origin
I normally have data from a number of different (but related)
experiments in the one file (perhaps X1 Y1 X2 Y2 or X Y1 Y2 Y3) format
that has been exported from OOo.calc or excel or perhaps from a data
analysis program that I have written for the purpose. The files are
usually tab-delimited and I can just Import ASCII in Origin and all is
well. It sets the column names to be the headers from the text file. I
can then create plots with some or all of the data and will frequently
want to add new data to a plot from another worksheet trivially etc. In
the end I produce an EPS graphic for use in LaTeX or a PNG graphic for
MSWord/MSPowerpoint. I like the "project" paradigm of Origin where data,
settings and graphs are all together allowing you to add new data to
plots, plot things in different ways etc. I don't just make the one
final publication quality graph in an Origin project; rather, I usually
have several different publication graphs in the one project (and often
both print and presentation versions of these graphs too) as well as
other graphs made while I explore the data. I tend not to use Origin for
data manipulation as it is much more cumbersome than using a proper
spreadsheet program like excel or OOo calc.

* OOo calc
Ha. Nice thought. Produces graphs that look as bad as M$Excel but can't
even handle X1 Y1 X2 Y2 data (only X Y1 Y2). Non-starter even for just
having a "quick look" at data, even if it does meet the "project"
paradigm.

* xmggrace
It can't read in any of my data files. Bit of a showstopper, really....
I almost always have data with columns: X Y1 Y2 Y3... or X1 Y1 X2 Y2 X3
Y3... and I *always* have text column headings labelling the data. It
chokes on this sort of thing and I'm not going to manually import
hundreds of individual files (or columns piped through tail +1 | cut -f
etc) through that tedious import dialogue.

* qtiplot
In a file with X1 Y1 X2 Y2 etc where the data streams are different
lengths, it chokes... all the data is left-aligned in the import (i.e.
if X1 Y1 has 20 rows but X2 Y2 has 30 rows, X1 Y1 will "gain" an extra
10 rows of data at the expense of X2 Y2). It also doesn't permit you to
define multiple X columns per data sheet or edit an existing plot to
change which column is to be used for the X or Y data etc (which is
useful in transferring settings from one plot to another in the absence
of styles or templates). Finally, the plots don't scale to the size of
the window (there is no defined "page" size) so if you make the window
bigger, then you have to manually increase all the font sizes yourself.

* labplot
Shares many of the same bugs/problems as qtiplot. But you also can't add
a new curve to an existing graph unless you read it in from a data file
directly or it is a mathematical function (i.e. you can't use data
already in a data sheet). Column headers are also discarded so you'd
have to go back and relabel all the columns. (OK, on the 1 week time
scale you can just remember them, but on the 1 year timescale you need
them labelled, and I always assume that i'm going to have to come back
to it on that timescale as it can be that long between doing the work
and publishing it.) It also can't generate smooth curves (e.g. splines)
between data points.

* scigraphica
Importing X1 Y1 X2 Y2 data into a sheet causes the data to be truncated
at the number of rows in X1. Column headers are imported, but if more
than one column has the same name then only the left-most column is used
when you come to graph that dataset.

* gnuplot
I'll confess that I have a deep seated aversion to gnuplot that dates
back to my undergraduate days which is probably unfair. Having said
that, I do prefer to be able to manipulate the graphs in real time *and*
be able to save the settings and data as a project. With gnuplot you can
do one or other (either run it from a script which is pretending to be a
"project" file or some sort, or you can do it in real time.)

*gri
Works great for me for quick scriptable graphing to condense the data
from several hundred simulations into a few graphs with minimal work.
But you can't quite get production quality graphs out of it (e.g. you
can't have non-italicised subscripts or superscripts).

* pyx
This has the same non-real-time and non-project problem of gnuplot, but
I have to say that of all these utils, this is the one I am most like to
adopt. I was able to get much better results with this than the other
tools, mainly because it can read in arbitrary data structures. But I'm
not sure it's really sustainable for long-run usage as configuring a
plot that isn't *quite* what you want is a huge amount of work. The
biggest problem I had (showstopper!), however, is that it can't generate
any form of non-linear line between data points (e.g. spline or
b-spline). That makes for unacceptably ugly graphs. 

* others?
Are there any other utils that you can suggest that might meet my
requirements? I'm happy to try things out and will post back reviews of
them too when I have a chance.


** conclusions
The two main things that I can condense out of this are that:

* there is nothing in linux land that even comes close to Origin for
flexible scientific graphing and data management. That's a pity... linux
leads in everything else, but I know very few other people who will put
the sort of time in that I have done in trying to get this to work.

* the X1 Y1 X2 Y2 data format is a problem.... many of these utilities
would work better if I wasn't using that. However, dropping that format
comes at the expense of me having to do a lot more work to split up
files and then import them individually.

* generating smooth-curve data is a problem... I'll play around with
spline(1) from the plotutils package and aspline(1) from the spline
package to see if that could be a viable filter. But having to filter
all the data through an external spline program is somewhat suboptimal.
Perhaps there is python module for this that will work with PyX?



(Note that this is not to say that Origin is without faults. Apart from
the idealogical reasons for moving to free software, the EPS export in
Origin6 doesn't clip the data to within the axes which was the final
motivation for me looking elsewhere. Eventually, I found a workaround
by printing to a PS file and then using ps2epsi but that's not a great
solution.)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

publication quality graphs

Reply via email to