Hi Bernd,
Bernd Web wrote:
Hi,
I'd like to run iep on a sequence and use either pir or osformat gifasta.
The following gives an error (using emboss 5.0.0 on Debian):
iep -filter -osformat gifasta -sequence seq.txt
This returns "Died: Unknown qualifier -osformat"
-osformat is for sequence outputs (and iep has no sequence outputs)
iep writes a plain text file as output and no special options
but we will add more information (accession and description) for a
future release ... and to other plain text output files too.
iep -filter -sformat pir seq.txt or iep -sformat pir -sequence seq.txt
also give an error:
"Died: iep terminated: Bad value for '-sequence' with -auto defined"
(with or without the sequence flag)
However, iep -sformat fasta seq.txt works. What am I doing wrong?
It appears your sequence can be read in fasta format but not in pir
format. PIR format has special characters after the first '>'
My FastA definition line is e.g.
ENSG00000205090|1|protein_coding.
The IEP report would me more useful if it contains the ENSG number
instead of "protein coding or the entire definition line.
Not a nice format. NCBI made up a lot of FASTA file identifiers with '|'
characters and we try to follow their rules. That causes us to ignore
the first part (it should be a database name) and reas the ID from the end.
You could reformat the FASTA files (e.g. with a perl script) to remove
the '|' characters and leave something useful as the plain ID (perhaps
ENSG00000205090_1 in this case) and the rest as description.
Hope that helps,
Peter Rice
_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss