Hi

There were a couple of reasons why it was decided to store the gene models
in that way in the Pathogen database here. People felt that they may want to
annotate to a UTR and also in some ways it was a closer representation to
how the standard Artemis would work with separate UTRs and CDS features.

However, I will now have a look at this so that Artemis can optionally work
with this alternative representation and so that it correctly infers the
UTRs.

Regards
Tim

On 6/5/09 4:44 PM, "Leighton Pritchard" <lpr...@scri.ac.uk> wrote:

> Hi,
> 
> With apologies for the cross-post, but I'll take an answer from either camp
> if it solves my problem.
> 
> I'm using Artemis v11, and Chado/scripts from GMOD-1.0, with PostgreSQL 8.3
> 
> My problem at the moment is this: I have a genome annotation with gene
> models that look like they satisfy the Sequence Ontology parent-child
> relationships, and are correct.  The upload script for Chado modifies these
> gene models in a way that I'm not entirely happy with; I would prefer to
> keep the original gene model information.  When viewing the information from
> Chado in Artemis, Artemis appears to make invalid assumptions about the gene
> model in Chado; it does not do this when reading the annotation from a local
> .gff3 file.
> 
> My example case:
> 
> I have a GFF3 format file with entries like this:
> 
> supercont1.1    .    gene    6164086    6165743    .    +    .
> ID=7000001843685266;Name=conserved%20hypothetical%20protein;Alias=PITG_01056
> ;
> supercont1.1    .    mRNA    6164086    6165743    .    +    .
> ID=7000001843685267;Parent=7000001843685266;Alias=PITT_01056;
> supercont1.1    .    five_prime_UTR    6164086    6164127    .    +    .
> ID=7000001843685267.UTR5p1;Parent=7000001843685267
> supercont1.1    .    exon    6164086    6164456    .    +    .
> ID=7000001843685267.exon1;Parent=7000001843685267
> supercont1.1    .    CDS    6164128    6164456    .    +    .
> ID=7000001843685267.cds1;Parent=7000001843685267
> supercont1.1    .    exon    6164548    6165169    .    +    .
> ID=7000001843685267.exon2;Parent=7000001843685267
> supercont1.1    .    CDS    6164548    6165169    .    +    .
> ID=7000001843685267.cds2;Parent=7000001843685267
> supercont1.1    .    exon    6165239    6165415    .    +    .
> ID=7000001843685267.exon3;Parent=7000001843685267
> supercont1.1    .    CDS    6165239    6165415    .    +    .
> ID=7000001843685267.cds3;Parent=7000001843685267
> supercont1.1    .    exon    6165477    6165743    .    +    .
> ID=7000001843685267.exon4;Parent=7000001843685267
> supercont1.1    .    CDS    6165477    6165743    .    +    .
> ID=7000001843685267.cds4;Parent=7000001843685267
> 
> This defines a gene, transcript mRNA, and exons that comprise a 5`UTR with
> CDS.  This all seems to me to be correct and in line with Sequence Ontology
> definitions.
> 
> The recommended upload script, both at the GMOD site
> (http://gmod.org/wiki/Load_GFF_Into_Chado), and at the Artemis site
> (http://www.sanger.ac.uk/Software/Artemis/v11/chado/dbloading.shtml), is
> gmod_bulk_load_gff3.pl.
> 
> When I use this script, as is documented in the perldoc, CDS features are
> converted to polypeptide features, and CDS and UTR features are combined
> into exon features (unless I use the --noexon flag when calling the script).
> In general terms I would prefer this not to happen, and to keep separate UTR
> and CDS features within the database, but there doesn't appear to be an
> option to upload the gene models without this conversion.  As no information
> is lost, I can live with this, so long as the downstream interpretation
> makes the correct inferences regarding UTR and CDS.
> 
> Unfortunately, this does not seem to be the case with Artemis.  As noted at
> http://www.sanger.ac.uk/Software/Artemis/v11/chado/overview.shtml#GENE the
> gene model that Artemis assumes defines exons differently to Chado's upload
> script (and, incidentally, differently to the Sequence Ontology itself:
> http://www.sequenceontology.org/miso/current_cvs/term/SO:0000147),
> considering them to be equivalent to CDS features.  This, as you might
> imagine, causes some issues for us.  Artemis manages to open up the GFF3
> file directly without any issues, but as we'd quite like to have multiple
> users modify the same annotation, this isn't an option for us.
> 
> I would be happy with any of the following kinds of solution:
> 
> * pointers to what I'm doing wrong (other than not using Apollo ;) )
> 
> * a script that uploads our GFF3 data into Chado in a form that preserves
> our current gene models for viewing in Artemis (I guess you use one of these
> at the Sanger - would you be able to make one of these publicly-available,
> or point me to one?)
> 
> * a modification to Artemis that infers CDS and UTRs correctly from the
> exon/polypeptide data in Chado, as uploaded by the recommended upload
> script.
> 
> * anything else that make this work.
> 
> Thanks for your attention,
> 
> L.
> 



_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

Reply via email to