Hi Tarik H.,

What you have noticed is how the alignment coordinates 
are stored in the UCSC database system.

The basic convention is:
1) coordinates are stored smallest -> largest, regardless of strand
2) exons (and inferred introns) will then appear in reverse order. The 
labels start/end should be interpreted as end/start for alignments on the (-) 
strand
3) in addition, the coordinates are formatted to be "zero-based, half-open"

Some related help documents:
http://genome.ucsc.edu/FAQ/FAQdownloads#download20
http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
http://genome.ucsc.edu/FAQ/FAQformat#format2
http://genome.ucsc.edu/FAQ/FAQformat#format9

And a concise definition is in this prior response:
https://lists.soe.ucsc.edu/pipermail/genome/2009-June/019359.html

Hopefully this helps to clear up the content, but please
let us know if you have other questions,

Jennifer
Genome Bioinformatics Group
 

> ------------------------------------------------------------------------
> > 
> > Subject:
> > Drosophila UCSC versus Flybase Intron annotation
> > From:
> > "Hadzic, Tarik" <[email protected]>
> > Date:
> > Mon, 25 Jan 2010 14:18:37 -0600
> > To:
> > "[email protected]" <[email protected]>
> > 
> > To:
> > "[email protected]" <[email protected]>
> > 
> > 
> > Hello,
> > I recently found a quirky bug with Drosophila melanogaster gene 
> > annotation in your database. I am studying a gene on the - strand,
> and I 
> > found that the annotations I retrieved label incorrectly  introns of
> all 
> > genes on the - strand. What happens is that introns of genes on the
> - 
> > strand are reversed so that the last intron is labeled as the first
> intron.
> > 
> > Here's an example: gene *CG3832* in FlyBase has the following
> Intron1 
> > annotation: >intron_*CG3832:**1*_*CG3832:2* type=intron; 
> > loc=*2R*:complement(*19875199**..19875636*); name=Phm-in; 
> > parent=FBgn0019948,FBtr0072202,FBtr0072203; 
> > MD5=26d4eb1b7deec01431ad6970766eff18; release=r5.23; species=Dmel; 
> > length=*438*; 
> > 
> > Nevertheless, when I use Galaxy and do an *intron_0_*  select on all
> Dm3 
> > introns from UCSC data tables (which are retrieved from the FlyBase
> 
> > track, so this makes no sense) states the following: 
> > chr2R *19873218* *19873285* CG3832-RA_intron*_0_0_*chr2R_19873219_r
> 0 -
> > chr2R *19873218* *19873285* CG3832-RB_intron*_0_0_*chr2R_19873219_r
> 0 -
> > 
> > So: *intron_0_* for CG3832 from Galaxy/UCSC is in fact the last
> intron 
> > in Flybase:
> >  >intron_CG3832:*7*_CG3832:*8* type=intron; 
> > loc=2R:complement(*19873219..19873285*); name=Phm-in; 
> > parent=FBgn0019948,FBtr0072202;
> MD5=383770efb08aa6a9788e3efaa7ceda62; 
> > release=r5.23; species=Dmel; length=67; 
> > 
> > This particular gene has 7 introns so if I run *select* on all Dm3 
> > introns for intron_6_ then I can get my 1st intron for CG3832:
> > chr2R *19875198* *19875636* CG3832-RA_intron_*6*_0_chr2R_19875199_r
> 0 -
> > chr2R *19875198* *19875636* CG3832-RB_intron_*6*_0_chr2R_19875199_r
> 0 -
> > 
> > How can I get around this issue? Since genes have different numbers
> of 
> > intron, it's impossible to simply reverse the logic. There must be a
> way 
> > to fix this problem. I know that RefSeq and FlyBase are not always
> the 
> > same, but I think RefSeq gives me the same problem. In general, I'd
> like 
> > to stick with FlyBase because they're the best for fly genomics.
> > 
> > I would really appreciate some help.
> > 
> > Thanks again,
> > 
> > Tarik H.
> > Washington University in St Louis
> > School of Medicine
> > MSTP
> > Taghert Lab
> > 
> >
> ------------------------------------------------------------------------
> > The materials in this message are private and may contain Protected
> 
> > Healthcare Information or other information of a sensitive nature.
> If 
> > you are not the intended recipient, be advised that any unauthorized
> 
> > use, disclosure, copying or the taking of any action in reliance on
> the 
> > contents of this information is strictly prohibited. If you have 
> > received this email in error, please immediately notify the sender
> via 
> > telephone or return mail.
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to