Hi,

With apologies for the cross-post, but I'll take an answer from either camp
if it solves my problem.

I'm using Artemis v11, and Chado/scripts from GMOD-1.0, with PostgreSQL 8.3

My problem at the moment is this: I have a genome annotation with gene
models that look like they satisfy the Sequence Ontology parent-child
relationships, and are correct.  The upload script for Chado modifies these
gene models in a way that I'm not entirely happy with; I would prefer to
keep the original gene model information.  When viewing the information from
Chado in Artemis, Artemis appears to make invalid assumptions about the gene
model in Chado; it does not do this when reading the annotation from a local
.gff3 file.

My example case:

I have a GFF3 format file with entries like this:

supercont1.1    .    gene    6164086    6165743    .    +    .
ID=7000001843685266;Name=conserved%20hypothetical%20protein;Alias=PITG_01056
;
supercont1.1    .    mRNA    6164086    6165743    .    +    .
ID=7000001843685267;Parent=7000001843685266;Alias=PITT_01056;
supercont1.1    .    five_prime_UTR    6164086    6164127    .    +    .
ID=7000001843685267.UTR5p1;Parent=7000001843685267
supercont1.1    .    exon    6164086    6164456    .    +    .
ID=7000001843685267.exon1;Parent=7000001843685267
supercont1.1    .    CDS    6164128    6164456    .    +    .
ID=7000001843685267.cds1;Parent=7000001843685267
supercont1.1    .    exon    6164548    6165169    .    +    .
ID=7000001843685267.exon2;Parent=7000001843685267
supercont1.1    .    CDS    6164548    6165169    .    +    .
ID=7000001843685267.cds2;Parent=7000001843685267
supercont1.1    .    exon    6165239    6165415    .    +    .
ID=7000001843685267.exon3;Parent=7000001843685267
supercont1.1    .    CDS    6165239    6165415    .    +    .
ID=7000001843685267.cds3;Parent=7000001843685267
supercont1.1    .    exon    6165477    6165743    .    +    .
ID=7000001843685267.exon4;Parent=7000001843685267
supercont1.1    .    CDS    6165477    6165743    .    +    .
ID=7000001843685267.cds4;Parent=7000001843685267

This defines a gene, transcript mRNA, and exons that comprise a 5`UTR with
CDS.  This all seems to me to be correct and in line with Sequence Ontology
definitions.

The recommended upload script, both at the GMOD site
(http://gmod.org/wiki/Load_GFF_Into_Chado), and at the Artemis site
(http://www.sanger.ac.uk/Software/Artemis/v11/chado/dbloading.shtml), is
gmod_bulk_load_gff3.pl.

When I use this script, as is documented in the perldoc, CDS features are
converted to polypeptide features, and CDS and UTR features are combined
into exon features (unless I use the --noexon flag when calling the script).
In general terms I would prefer this not to happen, and to keep separate UTR
and CDS features within the database, but there doesn't appear to be an
option to upload the gene models without this conversion.  As no information
is lost, I can live with this, so long as the downstream interpretation
makes the correct inferences regarding UTR and CDS.

Unfortunately, this does not seem to be the case with Artemis.  As noted at
http://www.sanger.ac.uk/Software/Artemis/v11/chado/overview.shtml#GENE the
gene model that Artemis assumes defines exons differently to Chado's upload
script (and, incidentally, differently to the Sequence Ontology itself:
http://www.sequenceontology.org/miso/current_cvs/term/SO:0000147),
considering them to be equivalent to CDS features.  This, as you might
imagine, causes some issues for us.  Artemis manages to open up the GFF3
file directly without any issues, but as we'd quite like to have multiple
users modify the same annotation, this isn't an option for us.

I would be happy with any of the following kinds of solution:

* pointers to what I'm doing wrong (other than not using Apollo ;) )

* a script that uploads our GFF3 data into Chado in a form that preserves
our current gene models for viewing in Artemis (I guess you use one of these
at the Sanger - would you be able to make one of these publicly-available,
or point me to one?)

* a modification to Artemis that infers CDS and UTRs correctly from the
exon/polypeptide data in Chado, as uploaded by the recommended upload
script.

* anything else that make this work.

Thanks for your attention,

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpr...@scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by 
guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential to 
the intended recipient at the e-mail address to which it has been addressed.  
It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmas...@scri.ac.uk quoting the name of the 
sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility 
for any viruses, and it is your responsibility to scan the email and the 
attachments (if any).
______________________________________________________

_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

Reply via email to