Many thanks for all the replies. Unless present at the beginning (or end?) of a sequence, "X" seems to be treated as a pseudo-residue which is used to calculate homology (i.e., if 2 aligned sequences have matching disordered loops denoted by X, this region is treated as identical). I've tried a bunch of non-alphanumerics which result in a crash or are stripped out. I'd like a wildcard which keeps residues in register but flags this region for exclusion in homology analysis.
Other comments about using the sequence information of a PDB header are great, but as mentioned, some PDB files don't have this information, and also, if you have the structure of say the C-terminal domain of a protein the N-terminal residues will not be present in the header. Many thanks in advance for any further tips/advice! On Sat, Sep 21, 2013 at 8:42 AM, Mo Wong <[email protected]> wrote: > Hi, > > I'm trying to do sequence alignments that are generated using PDB files as > the sequence source so are often missing residues with the sequence. Is > there any way to run BLAST (or related server - not a local program) that > accepts wildcards so I can keep the numbering in the resulting alignment in > register with the PDB? I've Googled round, and I'm surprised that I can't > find this addressed anywhere. > > Thanks! >
