Re: [EMBOSS] iep/gifasta

Peter Rice Tue, 18 Dec 2007 01:28:51 -0800

Hi Bernd,

Bernd Web wrote:

Hi,


I'd like to run iep on a sequence and use either pir or osformat gifasta.
The following gives an error (using emboss 5.0.0 on Debian):

iep -filter -osformat gifasta -sequence seq.txt
This returns "Died: Unknown qualifier -osformat"


-osformat is for sequence outputs (and iep has no sequence outputs)

iep writes a plain text file as output and no special options

but we will add more information (accession and description) for afuture release ... and to other plain text output files too.

iep -filter -sformat pir seq.txt or iep -sformat pir -sequence seq.txt
also give an error:
"Died: iep terminated: Bad value for '-sequence' with -auto defined"
(with or without the sequence flag)

However, iep -sformat fasta seq.txt works. What am I doing wrong?

It appears your sequence can be read in fasta format but not in pirformat. PIR format has special characters after the first '>'

My FastA definition line is e.g.

ENSG00000205090|1|protein_coding.

The IEP report would me more useful if it contains the ENSG number
instead of "protein coding or the entire definition line.

Not a nice format. NCBI made up a lot of FASTA file identifiers with '|'characters and we try to follow their rules. That causes us to ignorethe first part (it should be a database name) and reas the ID from the end.

You could reformat the FASTA files (e.g. with a perl script) to removethe '|' characters and leave something useful as the plain ID (perhapsENSG00000205090_1 in this case) and the rest as description.


Hope that helps,

Peter Rice


_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Re: [EMBOSS] iep/gifasta

Reply via email to