There was a bug introduced in the later versions of the "union" routine
in EMBOSS and I am not sure if it was fixed. As far as I know union may
work with GFF files but messes up the location of the features in
concatenated genbank and embl files. I reported the bug almost 2 years
ago. I have to use the emboss version 6.3.1 to concatenate files
correctly. Another program that can be used to join files is ugene.Go to
workflow designer/samples/merge sequences and shift corresponding
annotation. Select the correct files for input and output. Occasionally
this program will introduce 2 quotes ("") in place of one ("). Just
search and replace using a text editor.
Bruno
On 25/02/14 04:51, Tim Carver wrote:
Re: [Artemis-users] loading a eukaryotic genome assembly(multifasta)
and annotation into Artemis 16.0.0
In Artemis, with a multi-fasta sequence file, the options for the
annotation file are to use a GFF file or to in some way to concatenate
the EMBL/GenBank feature table (adjusting the coordinates to match the
correct position of the assembly). This is what 'union' should do with
EMBL files. I am not sure why this wasn't successful for you. You
obviously do need to use '-feature' with union to get the feature
table included:
union --feature --osf embl entry.embl
Using GFF and union are the options used here.
Regards
Tim
On 24/02/2014 20:05, "Steven Sullivan" <[email protected]> wrote:
ENA (EMBL) provides TEXT and FASTA file downloads for eukaryotic
assemblies. The FASTA download is single a multi-fasta file
containing separate records for each chromosome. The TEXT download
is a single EMBL feature table concatenating all the feature
tables of the individual chromosomes. It does not contain the DNA
sequence.
Loading these two files into Artemis yields a view of the entire
assembly as a concatenated sequence, but only the features for the
first chromosome in the feature file are loaded.
I understand that this issue has been brought up before. (e.g.
https://www.mail-archive.com/artemis-users%40sanger.ac.uk/msg00690.html)
What I don't see is a workaround. Mention was made of the EMBOSS
'union' command, which I have tried, but I am unable to make
that generate an .embl file that contains the correctly remapped
coordinates of the features onto the concatenated sequence. The
closest I came to success was an .embl file that mapped the first
chromosome features only , and incorrectly, onto the concatenated
sequence.
Is there a 'correct' way to do load a multifasta record and its
annotation into Artemis? The Artemis user manual is rather opaque
on this topic.
_______________________________________________
Artemis-users mailing list
[email protected]
http://publists.sanger.ac.uk/mailman/listinfo/artemis-users
--
Bruno Donzelli
Research Associate
Dept. of Plant Pathology and Plant-Microbe Biology, Cornell University
Robert W. Holley Center for Agriculture and Health
538 Tower Road, Cornell University
Ithaca, NY 14853
Phone: 607 255-2179
_______________________________________________
Artemis-users mailing list
[email protected]
http://publists.sanger.ac.uk/mailman/listinfo/artemis-users