Hi Tarik H., What you have noticed is how the alignment coordinates are stored in the UCSC database system.
The basic convention is: 1) coordinates are stored smallest -> largest, regardless of strand 2) exons (and inferred introns) will then appear in reverse order. The labels start/end should be interpreted as end/start for alignments on the (-) strand 3) in addition, the coordinates are formatted to be "zero-based, half-open" Some related help documents: http://genome.ucsc.edu/FAQ/FAQdownloads#download20 http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms http://genome.ucsc.edu/FAQ/FAQformat#format2 http://genome.ucsc.edu/FAQ/FAQformat#format9 And a concise definition is in this prior response: https://lists.soe.ucsc.edu/pipermail/genome/2009-June/019359.html Hopefully this helps to clear up the content, but please let us know if you have other questions, Jennifer Genome Bioinformatics Group > ------------------------------------------------------------------------ > > > > Subject: > > Drosophila UCSC versus Flybase Intron annotation > > From: > > "Hadzic, Tarik" <[email protected]> > > Date: > > Mon, 25 Jan 2010 14:18:37 -0600 > > To: > > "[email protected]" <[email protected]> > > > > To: > > "[email protected]" <[email protected]> > > > > > > Hello, > > I recently found a quirky bug with Drosophila melanogaster gene > > annotation in your database. I am studying a gene on the - strand, > and I > > found that the annotations I retrieved label incorrectly introns of > all > > genes on the - strand. What happens is that introns of genes on the > - > > strand are reversed so that the last intron is labeled as the first > intron. > > > > Here's an example: gene *CG3832* in FlyBase has the following > Intron1 > > annotation: >intron_*CG3832:**1*_*CG3832:2* type=intron; > > loc=*2R*:complement(*19875199**..19875636*); name=Phm-in; > > parent=FBgn0019948,FBtr0072202,FBtr0072203; > > MD5=26d4eb1b7deec01431ad6970766eff18; release=r5.23; species=Dmel; > > length=*438*; > > > > Nevertheless, when I use Galaxy and do an *intron_0_* select on all > Dm3 > > introns from UCSC data tables (which are retrieved from the FlyBase > > > track, so this makes no sense) states the following: > > chr2R *19873218* *19873285* CG3832-RA_intron*_0_0_*chr2R_19873219_r > 0 - > > chr2R *19873218* *19873285* CG3832-RB_intron*_0_0_*chr2R_19873219_r > 0 - > > > > So: *intron_0_* for CG3832 from Galaxy/UCSC is in fact the last > intron > > in Flybase: > > >intron_CG3832:*7*_CG3832:*8* type=intron; > > loc=2R:complement(*19873219..19873285*); name=Phm-in; > > parent=FBgn0019948,FBtr0072202; > MD5=383770efb08aa6a9788e3efaa7ceda62; > > release=r5.23; species=Dmel; length=67; > > > > This particular gene has 7 introns so if I run *select* on all Dm3 > > introns for intron_6_ then I can get my 1st intron for CG3832: > > chr2R *19875198* *19875636* CG3832-RA_intron_*6*_0_chr2R_19875199_r > 0 - > > chr2R *19875198* *19875636* CG3832-RB_intron_*6*_0_chr2R_19875199_r > 0 - > > > > How can I get around this issue? Since genes have different numbers > of > > intron, it's impossible to simply reverse the logic. There must be a > way > > to fix this problem. I know that RefSeq and FlyBase are not always > the > > same, but I think RefSeq gives me the same problem. In general, I'd > like > > to stick with FlyBase because they're the best for fly genomics. > > > > I would really appreciate some help. > > > > Thanks again, > > > > Tarik H. > > Washington University in St Louis > > School of Medicine > > MSTP > > Taghert Lab > > > > > ------------------------------------------------------------------------ > > The materials in this message are private and may contain Protected > > > Healthcare Information or other information of a sensitive nature. > If > > you are not the intended recipient, be advised that any unauthorized > > > use, disclosure, copying or the taking of any action in reliance on > the > > contents of this information is strictly prohibited. If you have > > received this email in error, please immediately notify the sender > via > > telephone or return mail. _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
