Hi There were a couple of reasons why it was decided to store the gene models in that way in the Pathogen database here. People felt that they may want to annotate to a UTR and also in some ways it was a closer representation to how the standard Artemis would work with separate UTRs and CDS features.
However, I will now have a look at this so that Artemis can optionally work with this alternative representation and so that it correctly infers the UTRs. Regards Tim On 6/5/09 4:44 PM, "Leighton Pritchard" <lpr...@scri.ac.uk> wrote: > Hi, > > With apologies for the cross-post, but I'll take an answer from either camp > if it solves my problem. > > I'm using Artemis v11, and Chado/scripts from GMOD-1.0, with PostgreSQL 8.3 > > My problem at the moment is this: I have a genome annotation with gene > models that look like they satisfy the Sequence Ontology parent-child > relationships, and are correct. The upload script for Chado modifies these > gene models in a way that I'm not entirely happy with; I would prefer to > keep the original gene model information. When viewing the information from > Chado in Artemis, Artemis appears to make invalid assumptions about the gene > model in Chado; it does not do this when reading the annotation from a local > .gff3 file. > > My example case: > > I have a GFF3 format file with entries like this: > > supercont1.1 . gene 6164086 6165743 . + . > ID=7000001843685266;Name=conserved%20hypothetical%20protein;Alias=PITG_01056 > ; > supercont1.1 . mRNA 6164086 6165743 . + . > ID=7000001843685267;Parent=7000001843685266;Alias=PITT_01056; > supercont1.1 . five_prime_UTR 6164086 6164127 . + . > ID=7000001843685267.UTR5p1;Parent=7000001843685267 > supercont1.1 . exon 6164086 6164456 . + . > ID=7000001843685267.exon1;Parent=7000001843685267 > supercont1.1 . CDS 6164128 6164456 . + . > ID=7000001843685267.cds1;Parent=7000001843685267 > supercont1.1 . exon 6164548 6165169 . + . > ID=7000001843685267.exon2;Parent=7000001843685267 > supercont1.1 . CDS 6164548 6165169 . + . > ID=7000001843685267.cds2;Parent=7000001843685267 > supercont1.1 . exon 6165239 6165415 . + . > ID=7000001843685267.exon3;Parent=7000001843685267 > supercont1.1 . CDS 6165239 6165415 . + . > ID=7000001843685267.cds3;Parent=7000001843685267 > supercont1.1 . exon 6165477 6165743 . + . > ID=7000001843685267.exon4;Parent=7000001843685267 > supercont1.1 . CDS 6165477 6165743 . + . > ID=7000001843685267.cds4;Parent=7000001843685267 > > This defines a gene, transcript mRNA, and exons that comprise a 5`UTR with > CDS. This all seems to me to be correct and in line with Sequence Ontology > definitions. > > The recommended upload script, both at the GMOD site > (http://gmod.org/wiki/Load_GFF_Into_Chado), and at the Artemis site > (http://www.sanger.ac.uk/Software/Artemis/v11/chado/dbloading.shtml), is > gmod_bulk_load_gff3.pl. > > When I use this script, as is documented in the perldoc, CDS features are > converted to polypeptide features, and CDS and UTR features are combined > into exon features (unless I use the --noexon flag when calling the script). > In general terms I would prefer this not to happen, and to keep separate UTR > and CDS features within the database, but there doesn't appear to be an > option to upload the gene models without this conversion. As no information > is lost, I can live with this, so long as the downstream interpretation > makes the correct inferences regarding UTR and CDS. > > Unfortunately, this does not seem to be the case with Artemis. As noted at > http://www.sanger.ac.uk/Software/Artemis/v11/chado/overview.shtml#GENE the > gene model that Artemis assumes defines exons differently to Chado's upload > script (and, incidentally, differently to the Sequence Ontology itself: > http://www.sequenceontology.org/miso/current_cvs/term/SO:0000147), > considering them to be equivalent to CDS features. This, as you might > imagine, causes some issues for us. Artemis manages to open up the GFF3 > file directly without any issues, but as we'd quite like to have multiple > users modify the same annotation, this isn't an option for us. > > I would be happy with any of the following kinds of solution: > > * pointers to what I'm doing wrong (other than not using Apollo ;) ) > > * a script that uploads our GFF3 data into Chado in a form that preserves > our current gene models for viewing in Artemis (I guess you use one of these > at the Sanger - would you be able to make one of these publicly-available, > or point me to one?) > > * a modification to Artemis that infers CDS and UTRs correctly from the > exon/polypeptide data in Chado, as uploaded by the recommended upload > script. > > * anything else that make this work. > > Thanks for your attention, > > L. > _______________________________________________ Artemis-users mailing list Artemis-users@sanger.ac.uk http://lists.sanger.ac.uk/mailman/listinfo/artemis-users