Hi Ed, Another way to look for point mutations are the sequence clusters available at 100%, 95%, 90%, etc. sequence identity.
Here is an example: find mutations for HIV-1 protease: Start with an example protein structure, i.e., 1OHR. On the structure summary page of 1OHR click on the Seq. Similarity tab. Then select the 95% seq. id cluster to retrieve all proteins with >= 95% seq. identity to the query structure: http://www.rcsb.org/pdb/explore/sequenceCluster.do?structureId=1OHR Peter Rose RCSB PDB -----Original Message----- From: CCP4 bulletin board [mailto:[email protected]] On Behalf Of Ed Pozharski Sent: Friday, June 22, 2012 6:14 PM To: [email protected] Subject: [ccp4bb] pdb sequence search Silly question. Say I want to find every structure in the PDB with the exact sequence or with perhaps 1-2 mutations. I know of two ways of doing this. 1. Go to NCBI BLAST and run the sequence against the PDB subset. The resulting list will have identities listed, so manual parsing is doable if there aren't too many hits. 2. PDB and PDBe both have the search by sequence features. Trouble is the default E value seems to be tailored to poor sequence identity (which makes sense if you looking for potential MR models). Sure, I can reduce the target E value, but it's a little cumbersome and I have no idea what the target level should be so that I don't get any 50% identical sequences yet not miss single/double mutants. Wouldn't it be nice if one could use the sequence identity cutoff/query coverage instead? Much more comprehensible than the E-value. Is there a search engine that does that? Seems like a fairly common need, and perhaps I just can't find on PDB website. Thanks in advance for any suggestions, Ed. -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
