Hi Sim, There is no full definition of a gene due to the lack of information which means you can't get all of the exons defined if they aren't present which is why a gene transcript is marked as "CDS is incomplete"
Gene prediction tracks are imprecise "predictions" of what might be there but are not actually in the reference sequence. The actual gene mRNA that was used to "predict" a gene at this location may have come from some other tissue that has a different actual DNA sequence. We can map the mRNA almost exactly to the reference sequence, but a few bases may not be present in the mapping and so we mark the CDS as "incomplete". One of our engineers suggests reading papers on N-SCAN and Augustus. If you have further questions, please don't hesitate in contacting the mailing list: [email protected]. Vanessa Kirkup Swing UCSC Genome Bioinformatics Group ----- Original Message ----- From: "SIM Ngak Leng" <[email protected]> To: [email protected] Sent: Wednesday, May 25, 2011 6:58:18 PM Subject: [Genome] Assembling exons from incomplete genes Greetings, I am trying to programmatically assemble exons using the data obtained from the USCS website (ensGene.txt.gz, ccdsGene.txt.gz, etc.) to from the resulting amino acid sequence. I understand that some records are incomplete (ie, cdsStartStat isn't cmpl) so that the list of nucleotide sequence obtained isn't a multiple of 3. Is there any way to generate the protein from these records using exonFrames, or other methods? And if so, how should I go about doing it? Thank you in advance. Regards, Sim Ngak Leng Bioinformatics Specialist Genome Institute of Singapore _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
