Thanks a lot Brooke for your detailed answer, everything is clear to me now Mali
On Wed, Jul 6, 2011 at 10:36 PM, Brooke Rhead <[email protected]> wrote: > Hi Mali, > > The answer to your question is more complicated than I thought it would be. > The sacCer2 assembly at UCSC and the EF2 assembly at Ensembl are *almost* > the same build. > > Ensembl's site says that the EF2 genome is from March 2010: > http://apr2011.archive.**ensembl.org/Saccharomyces_** > cerevisiae/Info/Index?db=core<http://apr2011.archive.ensembl.org/Saccharomyces_cerevisiae/Info/Index?db=core> > > The UCSC sacCer2 genome "is based on sequence dated June 2008 in the > Saccharomyces Genome Database (SGD)": > http://genome.ucsc.edu/cgi-**bin/hgGateway?&db=sacCer2<http://genome.ucsc.edu/cgi-bin/hgGateway?&db=sacCer2> > > However, there are some tiny differences between the two genome builds. > One of our engineers summed it up: > --- > The Ensembl EF2 sequence is different from UCSC sacCer2 sequence > in six bases on two chromosomes. UCSC has one extra T in chrX > and one more C and T, and three less G on chrXIV > > EF2 has: > #seq len A C G T > X 745741 231168 142294 143873 228406 > XIV 784334 241562 151655 151389 239728 > > UCSC sacCer2 has these two chromosomes as: > #seq len A C G T > chrX 745742 231168 142294 143873 228407 > chrXIV 784333 241562 151656 151386 239729 > --- > > So, sacCer2 and EF2 are slightly different. The gene coordinates in UCSC's > Ensembl Genes track are downloaded directly from Ensembl. Since they are > given in Ensembl's EF2 coordinates, some of the annotations on chromosomes > 10 and 14 are off by one base when they are displayed on the sacCer2 genome > browser. > > You can see some examples of the problem on chromosomes 10 and 14 by > turning on both the "SGD Genes" (created from data downloaded from SGD on > January 30, 2009; see: http://genome.ucsc.edu/cgi-** > bin/hgGene?hgg_do_kgMethod=1<http://genome.ucsc.edu/cgi-bin/hgGene?hgg_do_kgMethod=1>) > and "Ensembl Genes" (updated with each Ensembl update -- currently on > version 62) tracks in the Genome Browser. For instance: > > SOR1 > chrX:736035-737108 sgdGene > chrX:736034-737107 ensGene > > PAU6 > chrXIV:781918-782280 sgdGene > chrXIV:781919-782281 ensGene > > I hope this helps you decide what data to work with. If you have further > questions for us, please feel free to write back to [email protected]. > > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > > On 06/30/11 11:37, mali salmon wrote: > >> Thanks Brooke >> So I just want to be sure that I use the same build for the sequences and >> the annotation. I use sacCer2 genome I downloaded from your ftp site, and >> a >> gtf file of annotated ensembl genes from the table browser (for sacCer2). >> Is >> this OK? do the genome and the annotation are of the same build? >> What confuses me is that there were no difference in the genomic locations >> for ensembl genes downloaded from the table browser (sacCer2 June 2008), >> and >> those for EF2 from ensembl site. I thought that sacCer2 and EF2 are two >> different builds of the genome. Am I wrong? >> Thanks for your help >> Mali >> >> On Thu, Jun 30, 2011 at 7:25 PM, Brooke Rhead <[email protected]> wrote: >> >> Hello Mali, >>> >>> I see what you are talking about now. Thank you for clarifying. >>> >>> I confirmed with our engineers that the "(lifted to sacCer2 from Ensembl >>> version EF 2)" comment was only applicable to version 59 of Ensembl Genes >>> on >>> sacCer2. Versions 60, 61, and 62 are identical to Ensembl. The comment >>> was >>> incorrect and has been removed. >>> >>> Thank you for alerting us to this error, and sorry for the confusion! >>> >>> >>> -- >>> Brooke Rhead >>> UCSC Genome Bioinformatics Group >>> >>> >>> On 06/29/11 22:01, mali salmon wrote: >>> >>> Dear Brooke >>>> Thanks for your reply. I'm not on a mirror site, but using the main UCSC >>>> site. >>>> In order to download the gtf file I went to the table browser, and chose >>>> the >>>> following: >>>> genome: S.cerevisiae >>>> assembly: June 2008, SGD/sacCer2 >>>> group: Ensembl genes >>>> track: ensGene >>>> output format: GTF >>>> When I click on the "Describe table schema" link I see: >>>> "Schema for Ensembl Genes - Ensembl Genes *(lifted to sacCer2 from >>>> Ensembl >>>> version EF 2*)" >>>> Mali >>>> >>>> >>>> On Wed, Jun 29, 2011 at 11:00 PM, Brooke Rhead <[email protected]> >>>> wrote: >>>> >>>> Hi Mali, >>>> >>>>> Can you be more specific about how you downloaded the file from the >>>>> Table >>>>> Browser? I only see regular (not lifed) version 62 Ensembl genes on >>>>> the >>>>> UCSC sacCer2 browser. >>>>> >>>>> Were you by any chance on a mirror site, and not on >>>>> http://genome.ucsc.edu/? >>>>> >>>>> -- >>>>> Brooke Rhead >>>>> UCSC Genome Bioinformatics Group >>>>> >>>>> >>>>> >>>>> On 06/29/11 03:15, mali salmon wrote: >>>>> >>>>> Dear Sir/Madam >>>>> >>>>>> I have downloaded a gtf file for yeast ensembl genes from the ucsc >>>>>> table >>>>>> browser. >>>>>> According to the description of the table, the locations were "lifted >>>>>> to >>>>>> sacCer2 from Ensembl version EF 2." >>>>>> However, when I compare the locations I get to those from ensembl gtf >>>>>> file >>>>>> I >>>>>> downloaded from ensembl ("Saccharomyces_cerevisiae.******EF2. >>>>>> 62.gtf), I see that there is no difference. >>>>>> How this could be? I suppose there are some changes between the two >>>>>> builds >>>>>> Looking forward to your reply >>>>>> Thanks >>>>>> Mali >>>>>> ______________________________******_________________ >>>>>> Genome maillist - [email protected] >>>>>> https://lists.soe.ucsc.edu/******mailman/listinfo/genome<https://lists.soe.ucsc.edu/****mailman/listinfo/genome> >>>>>> <https**://lists.soe.ucsc.edu/****mailman/listinfo/genome<https://lists.soe.ucsc.edu/**mailman/listinfo/genome> >>>>>> > >>>>>> <https:**//lists.soe.ucsc.edu/**mailman/**listinfo/genome<http://lists.soe.ucsc.edu/mailman/**listinfo/genome> >>>>>> <http**s://lists.soe.ucsc.edu/**mailman/listinfo/genome<https://lists.soe.ucsc.edu/mailman/listinfo/genome> >>>>>> > >>>>>> >>>>>> >> _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
