It is the lack of compatibility between different versions mentioned by
Ethan that really put me off learning PYTHON. In contrast, the
FORTRAN-66 program SHELX76 still compiles and runs correctly with any
modern FORTRAN compiler. The only significant 'new' features that I now
use are dynamic array allocation (introduced in FORTRAN-90) and OpenMP
support for multiple CPUs, but even programs using OpenMP would still
work with older compilers because the OpenMP instructions would be
treated as comments.
George
On 09/12/2012 08:28 PM, Ethan Merritt wrote:
On Wednesday, September 12, 2012 09:52:09 am Jacob Keller wrote:
For the specific purpose you list -
input from tab-delimited data
output to simple statisitical summaries and (I assume) plots
- it sounds like gnuplot could do the job nicely.
I wasn't aware that gnuplot can do calculations--can it? I was probably
going to use it somewhere as a plotting option.
Here's a simple-minded example using a dump of the current contents
of the PDB from www.pdb.org as a comma-separated file with ~65000 entries.
The input file was previously filtered to contain only X-ray structures
between 1 and 4 Angstroms resolution.
gnuplot> !head -3 PDB.csv
PDB ID,R Observed,R All,R Work,R Free,Refinement Resolution
"100D","0.145","","0.145","","1.90"
"101D","0.163","","","0.252","2.25"
gnuplot> set datafile separater ","
gnuplot> set datafile nofpe_trap # trap handling greatly slows large data
sets
gnuplot> stats 'PDB.csv' using "R Observed" prefix "Robs"
* FILE:
Records: 63029
Out of range: 0
Invalid: 0
Blank: 2
Data Blocks: 2
* COLUMN:
Mean: 0.1982
Std Dev: 0.0334
Sum: 12494.6900
Sum Sq.: 2547.3068
Minimum: 0.0450 [24518]
Maximum: 0.9700 [45024]
Quartile: 0.1770
Median: 0.1970
Quartile: 0.2180
gnuplot> print Robs_mean
0.198237160672072
gnuplot> #calculate correlation of Robs with Resolution
gnuplot> stats 'PDB.cvs' using "R Observed":"Refinement Resolution" nooutput
gnuplot> print STATS_correlation
0.595763711910418
I've attached graphical output of the same data following some sorting,
filtered, binning, etc, with output to a PDF file.
You can do all this in R also. R has a larger collection of statistics
options,
but is not as good at dealing with really large data sets. IMHO gnuplot has
more
flexible options for graphical output.
Otherwise I'd recommend perl, and dis-recommend python.
Why are you dis-ing python? Seems everybody loves it...
I'm sure you can google for many "reasons I hate Python" lists.
Mine would start
1) sensitive to white space == fail
2) dynamic typing makes it nearly impossible to verify program correctness,
and very hard to debug problems that arise from unexpected input or
a mismatch between caller and callee.
3) the language developers don't care about backward compatibility;
it seems version 2.n+1 always breaks code written for version 2.n,
and let's not even talk about version 3
4) sloooow unless you use it simply as a wrapper for C++,
in which case why not just use C++ or C to begin with?
5) not thread-safe
you did ask...
Ethan
--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582