Dear Tim, Import in seqret was not faster than manual adaptation of the artemis output file. This is how I did it (no guarantee...):
1) Replace fasta_record with source, keep the right number of spaces (features start at column 22), add the qualifiers /organism="text" /mol_type="genomic DNA" (these are compulsary) under each FT source line (use a perl script, see example =a=). 2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and add FH Key Location/Qualifiers Then make an FT source line that spans the whole sequence. Add the qualifiers /organism="text" /mol_type="genomic DNA" (these are compulsary) and /focus /focus is to indicate that the sequence consists of more "sources", I would rather use the word contig, because /focus implies that the annotated genome is a contiguous sequence, which is not the case for me. Unfortunately /contig is not an official qualifier. As far as I can judge this is sufficient for import of all features into CLC. I only would like to tell CLC that it is not an contiguous sequence. Examples =a= example perl oneliner To insert after the line with FT source: FT /organism="Organism name" FT /mol_type="genomic DNA" perl -p -i -e 's/FT source\s{10}\d+\.\.\d+/$&\nFT \/organism=\"Organism name\"\nFT \/mol_type=\"genomic DNA\"/g' SourceFileArtemis.art =b= Header ID linear; genomic DNA; XX XX DT 02-NOV-2010 XX DE Organism name genome XX OS Organism name XX FH Key Location/Qualifiers FH FT source 1..5000000 FT /organism=">Organism name" FT /mol_type="genomic DNA" FT /note="text" FT /focus > Hi Jack > > Artemis is not really meant as a conversion tool between formats and in > particular EMBL/GenBank to GFF, although it will have a go. You could try > EMBOSS (seqret) to convert. However, it sounds like you have multiple > fasta > records in your file which may cause problems if you are writing out embl > files. So you may want to try writing the sequence out (File->Write->All > bases). Open this single sequence file and then read your annotation into > the sequence entry. Then write out the file as EMBL. > > Regards > Tim > > > On 10/21/10 12:26 PM, "Jack van de Vossenberg" > <j.vandevossenb...@science.ru.nl> wrote: > >> Addition: >> For Artemis export to GFF, I changed "fasta_record" to "source", which >> is a Feature Key in standard nomenclature*, and added the mandatory >> fields /organism= and /mol_type=, but every time I get a message that >> the source field cannot be exported. >> >> Is that normal behaviour? Can anyone tell what goes wrong? >> >> Cheers, Jack >> >> * >> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html >> >> On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote: >>> Dear all, >>> >>> I have annotated genome data in Artemis, and I would like to import the >>> result of that annotation into CLC Bio Genomics Workbench >>> (http://www.clcbio.com/). >>> >>> I tried direct import, selected all entries and exported from Artemis >>> to >>> EMBL, Genbank and Sequin. None of these were recognized by CLC, even >>> though it should be able to import many file formats >>> (http://www.clcbio.com/index.php?id=426). >>> I tried SFF, which does not include sequence data. So I used a separate >>> sequence file, the contigs concatenated into one large fasta sequence. >>> CLC >>> has a SFF import filter, which is very picky about the sequence names >>> (read CLC SFF import manual). I managed to let it import SFF, but I did >>> not see any annotation at all, I think because all ORFs are named >>> artemis >>> ("gff_seqname artemis"). Contig names are lost in SFF, so this option >>> may >>> import all annotated genes, but lose contig info. SFF does not >>> recognize >>> "fasta record" so I should rename this into something (but what? I >>> tried >>> "contig", "source", but the GFF file keeps on using ORFs only, all >>> named >>> "gff_seqname artemis"). >>> >>> Does anyone have experience with this? I thought of using another >>> program >>> as intermediate to convert Artemis data into CLC readable data. >>> >>> Thanks for your help, Jack >>> >>> >>> _______________________________________________ >>> Artemis-users mailing list >>> Artemis-users@sanger.ac.uk >>> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users >>> >> >> >> _______________________________________________ >> Artemis-users mailing list >> Artemis-users@sanger.ac.uk >> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users