Hello Bill, There are a few things going on that may contribute to the differences you notice, but overall, we double checked and think that the exon concatenation is being performed correctly. With a gene this large, a perfect alignment is not expected.
In your first query, data from the RefSeq Gene's track is being returned. The mRna sequence is returned. The mRna can differ slightly from the genomic reference sequence for many tracks. In the second query, data from the UCSC Gene's track is being returned. The genomic reference sequence is being returned - the portions covered by the mRna alignment. The UCSC Gene track includes the RefSeq sequence above (for this case, an earlier version, explained next), with some input from other data sources. In general, we would normally expect both tracks to have very similar data, however a closer inspection shows that the UCSC Gene uses an older version of the RefSeq (.2) and the RefSeq Gene track uses the brand new (Nov 2009) version of the RefSeq (.3). The genbank record describes the assembly process for this transcript and notes that portions of genomic were used - very likely NCBI Build 37 for version .3, where the previous version either did not use genomic or used an earlier NCBI Build (36 or earlier). The Genbank data record does not specific state which version of the human genome was used for either version, but this is a likely assumption based on the release dates of the versions, and one that you could follow up on with a little analysis (or better, by contacting the RefSeq project and asking) if the change is of interest to you. Thank you for so carefully examining the data and explaining the situation. If you notice anything else a bit off and would like us to look into it, please feel welcomed to send question like this along. Jennifer ------------------------------------------------ Jennifer Jackson UCSC Genome Bioinformatics Group ----- "Bill Dickinson" <[email protected]> wrote: > From: "Bill Dickinson" <[email protected]> > To: [email protected] > Sent: Monday, December 7, 2009 12:02:55 PM GMT -08:00 US/Canada Pacific > Subject: [Genome] sequence differences between TTN mRNA block and > concatenated exons > > Hello, > > > > I compared the mRNA sequence block for the TTN gene obtained from > (1): > http://genome.ucsc.edu/cgi-bin/hgc?hgsid=147684681 > <http://genome.ucsc.edu/cgi-bin/hgc?hgsid=147684681&g=htcCdnaAli&i=NM_133378 > &c=chr2&l=179098963&r=179380395&o=179098963&aliTrack=refSeqAli&table=refGene > > > &g=htcCdnaAli&i=NM_133378&c=chr2&l=179098963&r=179380395&o=179098963&aliTrac > k=refSeqAli&table=refGene) > > with the concatenation of exon coding sequences obtained from (2): > http://genome.ucsc.edu/cgi-bin/hgc?hgsid=147684681 > <http://genome.ucsc.edu/cgi-bin/hgc?hgsid=147684681&g=htcGeneInGenome&i=NM_1 > 33378&c=chr2&l=179098963&r=179380395&o=refGene&table=refGene> > &g=htcGeneInGenome&i=NM_133378&c=chr2&l=179098963&r=179380395&o=refGene&tabl > e=refGene > > and found many differences which, in part, appear to be the result of > including pieces of intronic sequence with exons in the latter step > (2). > > > > I used the ATG in block two (from the mRNA block) and stripped off > all > sequence before this start codon in both sequence sets before the > comparison. > > > > I wonder if the process that extracts the exons (step 2) might have a > problem with this large gene. > > > > Regards, > > > > Bill Dickinson > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
