Yea its really frustrating. The jungle of file formats, their variations
and the constant need to convert one into another has always left me
wondering why is that. I would love to rant about it but I will stop
here. I am not a programmer and not all labs can have one at hand. I
would love to hire one but I can`t.
Bruno
On 25/02/14 12:01, Steven Sullivan wrote:
Bruno,
That is what happened to me using union (and I did use "-feature"):
the features were incorrectly mapped. I was using EMBOSS v6.5.7.
Thanks for the tip on ugene.
The alternative way I got this to work in Artemis was basically as Tim
described, which was to load an assembly sequence (multifasta) entry
+ a gff file entry. I could then write it back out as an .embl file
with the coordinates remapped to the concatenated assembly. The
process of loading the gff (which I downloaded from EupathDB) into
Artemis involved a lot of trial and error and massaging of the GFF
file. I also had to edit the final embl file feature table in order to
get gene IDs and products displayed correctly.
On Tue, Feb 25, 2014 at 10:31 AM, Bruno Donzelli <b...@cornell.edu
<mailto:b...@cornell.edu>> wrote:
There was a bug introduced in the later versions of the "union"
routine in EMBOSS and I am not sure if it was fixed. As far as I
know union may work with GFF files but messes up the location of
the features in concatenated genbank and embl files. I reported
the bug almost 2 years ago. I have to use the emboss version 6.3.1
to concatenate files correctly. Another program that can be used
to join files is ugene.Go to workflow designer/samples/merge
sequences and shift corresponding annotation. Select the correct
files for input and output. Occasionally this program will
introduce 2 quotes ("") in place of one ("). Just search and
replace using a text editor.
Bruno
On 25/02/14 04:51, Tim Carver wrote:
In Artemis, with a multi-fasta sequence file, the options for the
annotation file are to use a GFF file or to in some way to
concatenate the EMBL/GenBank feature table (adjusting the
coordinates to match the correct position of the assembly). This
is what 'union' should do with EMBL files. I am not sure why this
wasn't successful for you. You obviously do need to use
'-feature' with union to get the feature table included:
union --feature --osf embl entry.embl
Using GFF and union are the options used here.
Regards
Tim
On 24/02/2014 20:05, "Steven Sullivan" <sulli...@nyu.edu> wrote:
ENA (EMBL) provides TEXT and FASTA file downloads for
eukaryotic assemblies. The FASTA download is single a
multi-fasta file containing separate records for each
chromosome. The TEXT download is a single EMBL feature table
concatenating all the feature tables of the individual
chromosomes. It does not contain the DNA sequence.
Loading these two files into Artemis yields a view of the
entire assembly as a concatenated sequence, but only the
features for the first chromosome in the feature file are
loaded.
I understand that this issue has been brought up before.
(e.g.
https://www.mail-archive.com/artemis-users%40sanger.ac.uk/msg00690.html)
What I don't see is a workaround. Mention was made of the
EMBOSS 'union' command, which I have tried, but I am unable
to make that generate an .embl file that contains the
correctly remapped coordinates of the features onto the
concatenated sequence. The closest I came to success was an
.embl file that mapped the first chromosome features only ,
and incorrectly, onto the concatenated sequence.
Is there a 'correct' way to do load a multifasta record and
its annotation into Artemis? The Artemis user manual is
rather opaque on this topic.
_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk <mailto:Artemis-users@sanger.ac.uk>
http://publists.sanger.ac.uk/mailman/listinfo/artemis-users
--
Bruno Donzelli
Research Associate
Dept. of Plant Pathology and Plant-Microbe Biology, Cornell University
Robert W. Holley Center for Agriculture and Health
538 Tower Road, Cornell University
Ithaca, NY 14853
Phone: 607 255-2179
--
Dr. Steven Sullivan
Center for Genomics & Systems Biology
New York University
12 Waverly Place
New York, NY 10003
--
Bruno Donzelli
Research Associate
Dept. of Plant Pathology and Plant-Microbe Biology, Cornell University
Robert W. Holley Center for Agriculture and Health
538 Tower Road, Cornell University
Ithaca, NY 14853
Phone: 607 255-2179
_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://publists.sanger.ac.uk/mailman/listinfo/artemis-users