Hi Ed,

Another way to look for point mutations are the sequence clusters available at 
100%, 95%, 90%, etc. sequence identity.

Here is an example: find mutations for HIV-1 protease:

Start with an example protein structure, i.e., 1OHR. On the structure summary 
page of 1OHR click on the Seq. Similarity tab. Then select the 95% seq. id 
cluster to retrieve all proteins with >= 95% seq. identity to the query 
structure:
http://www.rcsb.org/pdb/explore/sequenceCluster.do?structureId=1OHR

Peter Rose
RCSB PDB

-----Original Message-----
From: CCP4 bulletin board [mailto:[email protected]] On Behalf Of Ed 
Pozharski
Sent: Friday, June 22, 2012 6:14 PM
To: [email protected]
Subject: [ccp4bb] pdb sequence search

Silly question.

Say I want to find every structure in the PDB with the exact sequence or with 
perhaps 1-2 mutations.  I know of two ways of doing this.

1. Go to NCBI BLAST and run the sequence against the PDB subset.  The resulting 
list will have identities listed, so manual parsing is doable if there aren't 
too many hits.

2. PDB and PDBe both have the search by sequence features.  Trouble is the 
default E value seems to be tailored to poor sequence identity (which makes 
sense if you looking for potential MR models).  Sure, I can reduce the target E 
value, but it's a little cumbersome and I have no idea what the target level 
should be so that I don't get any 50% identical sequences yet not miss 
single/double mutants.

Wouldn't it be nice if one could use the sequence identity cutoff/query 
coverage instead?  Much more comprehensible than the E-value.  Is there a 
search engine that does that? Seems like a fairly common need, and perhaps I 
just can't find on PDB website.

Thanks in advance for any suggestions,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
                                                 Julian, King of Lemurs

Reply via email to