Hi Bernd,

Bernd Web wrote:
Hi,

I'd like to run iep on a sequence and use either pir or osformat gifasta.
The following gives an error (using emboss 5.0.0 on Debian):

iep -filter -osformat gifasta -sequence seq.txt
This returns "Died: Unknown qualifier -osformat"

-osformat is for sequence outputs (and iep has no sequence outputs)

iep writes a plain text file as output and no special options
but we will add more information (accession and description) for a future release ... and to other plain text output files too.

iep -filter -sformat pir seq.txt or iep -sformat pir -sequence seq.txt
also give an error:
"Died: iep terminated: Bad value for '-sequence' with -auto defined"
(with or without the sequence flag)

However, iep -sformat fasta seq.txt works. What am I doing wrong?

It appears your sequence can be read in fasta format but not in pir format. PIR format has special characters after the first '>'

My FastA definition line is e.g.
ENSG00000205090|1|protein_coding.
The IEP report would me more useful if it contains the ENSG number
instead of "protein coding or the entire definition line.

Not a nice format. NCBI made up a lot of FASTA file identifiers with '|' characters and we try to follow their rules. That causes us to ignore the first part (it should be a database name) and reas the ID from the end.

You could reformat the FASTA files (e.g. with a perl script) to remove the '|' characters and leave something useful as the plain ID (perhaps ENSG00000205090_1 in this case) and the rest as description.

Hope that helps,

Peter Rice


_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to