Yea its really frustrating. The jungle of file formats, their variations and the constant need to convert one into another has always left me wondering why is that. I would love to rant about it but I will stop here. I am not a programmer and not all labs can have one at hand. I would love to hire one but I can`t.

Bruno



On 25/02/14 12:01, Steven Sullivan wrote:
Bruno,
That is what happened to me using union (and I did use "-feature"): the features were incorrectly mapped. I was using EMBOSS v6.5.7. Thanks for the tip on ugene.

The alternative way I got this to work in Artemis was basically as Tim described, which was to load an assembly sequence (multifasta) entry + a gff file entry. I could then write it back out as an .embl file with the coordinates remapped to the concatenated assembly. The process of loading the gff (which I downloaded from EupathDB) into Artemis involved a lot of trial and error and massaging of the GFF file. I also had to edit the final embl file feature table in order to get gene IDs and products displayed correctly.




On Tue, Feb 25, 2014 at 10:31 AM, Bruno Donzelli <b...@cornell.edu <mailto:b...@cornell.edu>> wrote:

    There was a bug introduced in the later versions  of the "union"
    routine in EMBOSS and I am not sure if it was fixed. As far as I
    know union may work with GFF files but messes up the location of
    the features in concatenated genbank and embl files. I reported
    the bug almost 2 years ago. I have to use the emboss version 6.3.1
    to concatenate files correctly. Another program that can be used
    to join files is ugene.Go to workflow designer/samples/merge
    sequences and shift corresponding annotation. Select the correct
    files for input and output. Occasionally this program will
    introduce 2 quotes ("") in place of one (").  Just search and
    replace using a text editor.

    Bruno


    On 25/02/14 04:51, Tim Carver wrote:

    In Artemis, with a multi-fasta sequence file, the options for the
    annotation file are to use a GFF file or to in some way to
    concatenate the EMBL/GenBank feature table (adjusting the
    coordinates to match the correct position of the assembly). This
    is what 'union' should do with EMBL files. I am not sure why this
    wasn't successful for you.  You obviously do need to use
    '-feature' with union to get the feature table included:

    union --feature --osf embl  entry.embl

    Using GFF and union are the options used here.

    Regards
    Tim

    On 24/02/2014 20:05, "Steven Sullivan" <sulli...@nyu.edu> wrote:

        ENA (EMBL) provides TEXT and FASTA file downloads for
        eukaryotic assemblies.  The FASTA download is single a
        multi-fasta file containing separate records for each
        chromosome. The TEXT download is a single EMBL feature table
        concatenating all the feature tables of the individual
        chromosomes.  It does not contain the DNA sequence.

        Loading these two files into Artemis yields a view of the
        entire assembly as a concatenated sequence, but only the
        features for the first chromosome in the feature file are
        loaded.

        I understand that this issue has been brought up before.
        (e.g.
        https://www.mail-archive.com/artemis-users%40sanger.ac.uk/msg00690.html)
         What I don't see is a workaround.  Mention was made of the
        EMBOSS 'union' command, which I have tried,  but I  am unable
        to make that generate an .embl file that contains the
        correctly remapped coordinates of the features onto the
        concatenated sequence. The closest I came to success was an
        .embl file that mapped the first chromosome features only ,
        and incorrectly, onto the concatenated sequence.


        Is there a 'correct' way to do load a multifasta record and
        its annotation into Artemis?  The Artemis user manual is
        rather opaque on this topic.



    _______________________________________________
    Artemis-users mailing list
    Artemis-users@sanger.ac.uk  <mailto:Artemis-users@sanger.ac.uk>
    http://publists.sanger.ac.uk/mailman/listinfo/artemis-users


-- Bruno Donzelli
    Research Associate
    Dept. of Plant Pathology and Plant-Microbe Biology, Cornell University
    Robert W. Holley Center for Agriculture and Health
    538 Tower Road, Cornell University
    Ithaca, NY 14853
    Phone: 607 255-2179




--
Dr. Steven Sullivan
Center for Genomics & Systems Biology
New York University
12 Waverly Place
New York, NY 10003



--
Bruno Donzelli
Research Associate
Dept. of Plant Pathology and Plant-Microbe Biology, Cornell University
Robert W. Holley Center for Agriculture and Health
538 Tower Road, Cornell University
Ithaca, NY 14853
Phone: 607 255-2179

_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://publists.sanger.ac.uk/mailman/listinfo/artemis-users

Reply via email to