Dear Tim,

Import in seqret was not faster than manual adaptation of the artemis
output file. This is how I did it (no guarantee...):


1) Replace fasta_record with source, keep the right number of spaces
(features start at column 22), add  the qualifiers /organism="text" 
/mol_type="genomic DNA" (these are compulsary) under each FT   source line
(use a perl script, see example =a=).

2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and
add FH   Key             Location/Qualifiers
Then make an FT   source line that spans the whole sequence. Add the
qualifiers /organism="text"  /mol_type="genomic DNA" (these are
compulsary) and /focus

/focus is to indicate that the sequence consists of more "sources", I
would rather use the word contig, because /focus implies that the
annotated genome is a contiguous sequence, which is not the case for me.
Unfortunately /contig is not an official qualifier.

As far as I can judge this is sufficient for import of all features into
CLC. I only would like to tell CLC that it is not an contiguous sequence.


Examples

=a= example perl oneliner
To insert after the line with FT   source:
FT                   /organism="Organism name"
FT                   /mol_type="genomic DNA"

perl -p -i -e 's/FT   source\s{10}\d+\.\.\d+/$&\nFT                  
\/organism=\"Organism name\"\nFT                   \/mol_type=\"genomic
DNA\"/g' SourceFileArtemis.art

=b= Header
ID   linear; genomic DNA;
XX
XX
DT   02-NOV-2010
XX
DE   Organism name genome
XX
OS   Organism name
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..5000000
FT                   /organism=">Organism name"
FT                   /mol_type="genomic DNA"
FT                   /note="text"
FT                   /focus


> Hi Jack
>
> Artemis is not really meant as a conversion tool between formats and in
> particular EMBL/GenBank to GFF, although it will have a go. You could try
> EMBOSS (seqret) to convert. However, it sounds like you have multiple
> fasta
> records in your file which may cause problems if you are writing out embl
> files. So you may want to try writing the sequence out (File->Write->All
> bases). Open this single sequence file and then read your annotation into
> the sequence entry. Then write out the file as EMBL.
>
> Regards
> Tim
>
>
> On 10/21/10 12:26 PM, "Jack van de Vossenberg"
> <j.vandevossenb...@science.ru.nl> wrote:
>
>> Addition:
>> For Artemis export to GFF, I changed "fasta_record" to "source", which
>> is a Feature Key in standard nomenclature*, and added the mandatory
>> fields /organism= and /mol_type=, but every time I get a message that
>> the source field cannot be exported.
>>
>> Is that normal behaviour? Can anyone tell what goes wrong?
>>
>> Cheers, Jack
>>
>> *
>> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
>>
>> On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:
>>> Dear all,
>>>
>>> I have annotated genome data in Artemis, and I would like to import the
>>> result of that annotation into CLC Bio Genomics Workbench
>>> (http://www.clcbio.com/).
>>>
>>> I tried direct import, selected all entries and exported from Artemis
>>> to
>>> EMBL, Genbank and Sequin. None of these were recognized by CLC, even
>>> though it should be able to import many file formats
>>> (http://www.clcbio.com/index.php?id=426).
>>> I tried SFF, which does not include sequence data. So I used a separate
>>> sequence file, the contigs concatenated into one large fasta sequence.
>>> CLC
>>> has a SFF import filter, which is very picky about the sequence names
>>> (read CLC SFF import manual). I managed to let it import SFF, but I did
>>> not see any annotation at all, I think because all ORFs are named
>>> artemis
>>> ("gff_seqname artemis"). Contig names are lost in SFF, so this option
>>> may
>>> import all annotated genes, but lose contig info. SFF does not
>>> recognize
>>> "fasta record" so I should rename this into something (but what? I
>>> tried
>>> "contig", "source", but the GFF file keeps on using ORFs only, all
>>> named
>>> "gff_seqname artemis").
>>>
>>> Does anyone have experience with this? I thought of using another
>>> program
>>> as intermediate to convert Artemis data into CLC readable data.
>>>
>>> Thanks for your help, Jack
>>>
>>>
>>> _______________________________________________
>>> Artemis-users mailing list
>>> Artemis-users@sanger.ac.uk
>>> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
>>>
>>
>>
>> _______________________________________________
>> Artemis-users mailing list
>> Artemis-users@sanger.ac.uk
>> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
>
>
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>



_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

Reply via email to