Tao Song wrote: > Hi, > > I wonder can the iep program that calculates the isoelectric point of > a protein be used > for a protein database? When asked to input protein sequence I gave 'tsw' > instead of > 'tsw:laci_ecoli' I got an error that said 'sequence must be protein sequence > without BZ U X > or *: found bad character Z'. Does iep can only take one protein sequence as > input file?
Your command does read the test swissprot database, but fails on an entry that is a sequence fragment with a Z ambiguity code. For the next release, I have a patch that will convert B and Z to D/N and E/Q using the Dayhoff frequencies of naturally occurring amino acids. This will convert the first B or Z to a charged residue (as these are more common), the second to an uncharged residue, and so on. With this change in place iep can be modified to accept any protein sequence and will produce consistent results on ambiguity codes. A question: We can try this fix as a general solution for programs requiring "pureprotein" input, by converting any B or Z (or J) ambiguity code. Is this useful? For iep the order does not matter and the converted sequence does not appear in the output, but I think a program-by-program solution is better. Other programs insisting on "pureprotein" input are hmoment, octanol and pepwindow regards, Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
