Re: [EMBOSS] shuffleseq for multifasta?

David Bauer Thu, 08 Nov 2018 23:32:33 -0800

Hi Anand,

if you run “shuffleseq –help” you will see the type of input and output 
sequences.
Version: EMBOSS:6.5.7.0


   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Sequence(s) filename and optional format, or
                                  reference (input USA)
  [-outseq]            seqoutall  [<sequence>.<format>] Sequence set(s)
                                  filename and optional format (output USA)

The “all” in seqall and seqoutall indicates that input and output can be 
sequence files with multiple sequences.
This can be fasta format or any other sequence format supported by EMBOSS 
(genbank, embl etc.)
The names of the sequences as they are in the original file, will be preserved 
in the output file.
If I try to reproduce your example with the file downloaded from IPK:

shuffleseq Athaliana_167_TAIR9.fa test1.fa

the output file contains the sequences as named in the input file:

infoseq -only -name -desc test1.fa
Name           Description
Chr1           CHROMOSOME dumped from ADB: Feb/3/09 16:9; last updated: 
2007-12-20
Chr2           CHROMOSOME dumped from ADB: Feb/3/09 16:10; last updated: 
2007-12-20
Chr3           CHROMOSOME dumped from ADB: Feb/3/09 16:10; last updated: 
2007-12-20
Chr4           CHROMOSOME dumped from ADB: Feb/3/09 16:10; last updated: 
2007-12-20
Chr5           CHROMOSOME dumped from ADB: Feb/3/09 16:10; last updated: 
2007-12-20
ChrM           CHROMOSOME dumped from ADB: Feb/3/09 16:10; last updated: 
2005-06-03
ChrC           CHROMOSOME dumped from ADB: Feb/3/09 16:10; last updated: 
2005-06-03

Your input file contains in the name “shIDscleaned-up”. You may have done some 
modifications to the sequence names which confuse EMBOSS.
You can test this by running the infoseq as above and check if you get for 
“Name” what you expect.
Make sure you don’t have any “:” characters in the sequence names in your fasta 
file. This character has a special meaning in EMBOSS sequence names.

Hope this helps.

Sincerely,
David.

Von: EMBOSS <[email protected]> Im 
Auftrag von Anandkumar Surendrarao
Gesendet: 09 November 2018 04:20
An: [email protected]
Betreff: [EMBOSS] shuffleseq for multifasta?

Greetings!

I am new to EMBOSS, and trying to use shufflseq to randomly shuffle entire 
genomes (one-by-one). My input genomic sequences are in multifasta format. And 
I wish to retain the same multifasta format for the output file as well, 
containing the shuffled DNA sequences.

From the information at 
http://emboss.sourceforge.net/apps/cvs/emboss/apps/shuffleseq.html, it appears 
to me that FASTA format for neither input not output is supported. Am I 
mistaken?
OR
Is there a way to specify (multi)FASTA as both input and output formats?

In one run that I completed with a genome assembly with 5 chromosmes - Chr1 ... 
Chr5, the syntax I used was:
shuffleseq -sequence Athaliana_167_TAIR9.fa.shIDscleaned-up -outseq 
Athaliana_167_TAIR9_EmbossShuffled.fas

Strangely, in the output file, the fasta headers were all repetitive Chr1.
Hence my confusion. Could someone please clarify what my input formatting 
should be and the correct syntax?

Thanks, in advance, for your help.

Sincerely,
Anand
_____
Anandkumar Surendrarao, PhD
+1.530.574.5134
+91.91760.70887

_______________________________________________
EMBOSS mailing list
[email protected]
http://mailman.open-bio.org/mailman/listinfo/emboss

Re: [EMBOSS] shuffleseq for multifasta?

Reply via email to