Jason, that was pretty fast! You delivered the script before someone
else could even suggest PyMOL ;)
If you use
from chempy import cpv
cpv.distance(aa.coord, bb.coord)
instead of
cmd.get_distance( "index %s" % aa.index, "index %s" % bb.index)
it will be significantly faster.
Cheers,
Thomas
On Mar 9, 2011, at 7:43 AM, Jason Vertrees wrote:
Hi Cale,
For any given structure in the PDB, I want to identify all the
Histidine ND1
atoms. I then want to consider these atoms in pairs, measure the
distance
in Angstroms between the ND1 atoms in each pair, and compile these
distances
(along with residue numbers of the pair) in a table. I then want
to repeat
this procedure for each unique structure in the PDB and generate a
table
containing all occurrences of HisND1 pairs with their corresponding
separation distance. Amongst other things, I want e.g. generate a
histogram
from this table and determine e.g. the shortest HisND1 pair distance
observed and the structure in which this happens. Does anyone have
any
suggestions for any tools I might be able to use to perform this
search?
This looked like a fun little PyMOL task. So, I wrote a script to do
it. This will iterate over all the PDBs in a given directory and
calculate the non-redundant distances of all HIS ND1s. A table is
written to pre_hist.csv. Use R or something similar to create the
histogram and do the numerical analysis.
To use this, first, you'll need a prepared copy of the PDB, suitable
for this task, downloaded locally in pdb_path. You will need to set
"pdb_path" in the script below to wherever your PDBs are. (The other
kind folks on this list already pointed you to resources for mirroring
the PDB, if you haven't done it already.) To run the script below
just save it to disk as "his.py" and launch it with:
# from Linux
pymol -cq his.py
# or, from Mac
/Applications/PyMOLX11Hybrid.app/Contents/MacOS/MacPyMOL -cq his.py
If you're a Linux/Mac user, when the script finishes, just type:
sort -n -k8 pre_hist.csv | head -2
into your BASH shell to get the shortest distance. For my example
PDB it was:
$ sort -n -k8 pre_hist.csv | head -2
PDB CHAIN RESI ATOM-A CHAIN RESI ATOM-B DISTANCE
5rla B 187 3729 C 312 7075 2.838208
To view the shortest distance use the ATOM-A and ATOM-B fields from
the output file. From my example above I would need to fetch the
protein and create a distance from "index 3729" to "index 7075".
Here's how:
# fetch the protein
fetch 5rla, async=0
# show the distance
distance index 3729, index 7075
# zoom on the new distance
zoom dist01
Here's the Python script that does the work:
import glob, os, pymol, sys
from pymol import cmd
the_pdb="/Users/vertrees/small_pdb"
files = glob.glob(the_pdb+os.sep+"*.pdb")
if not len(files):
print "Please set 'the_pdb' variable to a valid path containing
PDB files."
sys.exit(1)
else:
print "Processing %d files." % len(files)
s, outFile = "resn HIS and name ND1", "pre_hist.csv"
f = open(outFile, 'wb')
# write the header
f.write("PDB\tCHAIN\tRESI\tATOM-A\tCHAIN\tRESI\tATOM-B\tDISTANCE\n")
# for each file in the mirror
for x in files:
cmd.load(x,finish=1)
n = cmd.get_names()[0]
m = cmd.get_model(s).atom
# pairwise for each atom
for aa in m:
for bb in m:
# avoid duplicates
if aa.index==bb.index: continue
f.write( "%s\t%s\t%s\t%s\t%s\t%s\t%d\t%f\n" %
(n, aa.chain, aa.resi, aa.index,
bb.chain, bb.resi, bb.index,
cmd.get_distance( "index %s" % aa.index,
"index %s" % bb.index)))
cmd.delete(n)
f.close()
print "Processed %d files. Please see %s for results." %
(len(files), outFile)
Cheers,
-- Jason
--
Jason Vertrees, PhD
PyMOL Product Manager
Schrodinger, LLC
(e) [email protected]
(o) +1 (603) 374-7120
--
Thomas Holder
MPI for Developmental Biology
Spemannstr. 35
D-72076 Tübingen