Hi Mali, The answer to your question is more complicated than I thought it would be. The sacCer2 assembly at UCSC and the EF2 assembly at Ensembl are *almost* the same build.
Ensembl's site says that the EF2 genome is from March 2010: http://apr2011.archive.ensembl.org/Saccharomyces_cerevisiae/Info/Index?db=core The UCSC sacCer2 genome "is based on sequence dated June 2008 in the Saccharomyces Genome Database (SGD)": http://genome.ucsc.edu/cgi-bin/hgGateway?&db=sacCer2 However, there are some tiny differences between the two genome builds. One of our engineers summed it up: --- The Ensembl EF2 sequence is different from UCSC sacCer2 sequence in six bases on two chromosomes. UCSC has one extra T in chrX and one more C and T, and three less G on chrXIV EF2 has: #seq len A C G T X 745741 231168 142294 143873 228406 XIV 784334 241562 151655 151389 239728 UCSC sacCer2 has these two chromosomes as: #seq len A C G T chrX 745742 231168 142294 143873 228407 chrXIV 784333 241562 151656 151386 239729 --- So, sacCer2 and EF2 are slightly different. The gene coordinates in UCSC's Ensembl Genes track are downloaded directly from Ensembl. Since they are given in Ensembl's EF2 coordinates, some of the annotations on chromosomes 10 and 14 are off by one base when they are displayed on the sacCer2 genome browser. You can see some examples of the problem on chromosomes 10 and 14 by turning on both the "SGD Genes" (created from data downloaded from SGD on January 30, 2009; see: http://genome.ucsc.edu/cgi-bin/hgGene?hgg_do_kgMethod=1) and "Ensembl Genes" (updated with each Ensembl update -- currently on version 62) tracks in the Genome Browser. For instance: SOR1 chrX:736035-737108 sgdGene chrX:736034-737107 ensGene PAU6 chrXIV:781918-782280 sgdGene chrXIV:781919-782281 ensGene I hope this helps you decide what data to work with. If you have further questions for us, please feel free to write back to [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 06/30/11 11:37, mali salmon wrote: > Thanks Brooke > So I just want to be sure that I use the same build for the sequences and > the annotation. I use sacCer2 genome I downloaded from your ftp site, and a > gtf file of annotated ensembl genes from the table browser (for sacCer2). Is > this OK? do the genome and the annotation are of the same build? > What confuses me is that there were no difference in the genomic locations > for ensembl genes downloaded from the table browser (sacCer2 June 2008), and > those for EF2 from ensembl site. I thought that sacCer2 and EF2 are two > different builds of the genome. Am I wrong? > Thanks for your help > Mali > > On Thu, Jun 30, 2011 at 7:25 PM, Brooke Rhead <[email protected]> wrote: > >> Hello Mali, >> >> I see what you are talking about now. Thank you for clarifying. >> >> I confirmed with our engineers that the "(lifted to sacCer2 from Ensembl >> version EF 2)" comment was only applicable to version 59 of Ensembl Genes on >> sacCer2. Versions 60, 61, and 62 are identical to Ensembl. The comment was >> incorrect and has been removed. >> >> Thank you for alerting us to this error, and sorry for the confusion! >> >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> On 06/29/11 22:01, mali salmon wrote: >> >>> Dear Brooke >>> Thanks for your reply. I'm not on a mirror site, but using the main UCSC >>> site. >>> In order to download the gtf file I went to the table browser, and chose >>> the >>> following: >>> genome: S.cerevisiae >>> assembly: June 2008, SGD/sacCer2 >>> group: Ensembl genes >>> track: ensGene >>> output format: GTF >>> When I click on the "Describe table schema" link I see: >>> "Schema for Ensembl Genes - Ensembl Genes *(lifted to sacCer2 from Ensembl >>> version EF 2*)" >>> Mali >>> >>> >>> On Wed, Jun 29, 2011 at 11:00 PM, Brooke Rhead <[email protected]> >>> wrote: >>> >>> Hi Mali, >>>> Can you be more specific about how you downloaded the file from the Table >>>> Browser? I only see regular (not lifed) version 62 Ensembl genes on the >>>> UCSC sacCer2 browser. >>>> >>>> Were you by any chance on a mirror site, and not on >>>> http://genome.ucsc.edu/? >>>> >>>> -- >>>> Brooke Rhead >>>> UCSC Genome Bioinformatics Group >>>> >>>> >>>> >>>> On 06/29/11 03:15, mali salmon wrote: >>>> >>>> Dear Sir/Madam >>>>> I have downloaded a gtf file for yeast ensembl genes from the ucsc table >>>>> browser. >>>>> According to the description of the table, the locations were "lifted to >>>>> sacCer2 from Ensembl version EF 2." >>>>> However, when I compare the locations I get to those from ensembl gtf >>>>> file >>>>> I >>>>> downloaded from ensembl ("Saccharomyces_cerevisiae.****EF2. >>>>> 62.gtf), I see that there is no difference. >>>>> How this could be? I suppose there are some changes between the two >>>>> builds >>>>> Looking forward to your reply >>>>> Thanks >>>>> Mali >>>>> ______________________________****_________________ >>>>> Genome maillist - [email protected] >>>>> https://lists.soe.ucsc.edu/****mailman/listinfo/genome<https://lists.soe.ucsc.edu/**mailman/listinfo/genome> >>>>> <https:**//lists.soe.ucsc.edu/mailman/**listinfo/genome<https://lists.soe.ucsc.edu/mailman/listinfo/genome> >>>>> > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
