Hi Mali,

The answer to your question is more complicated than I thought it would 
be.  The sacCer2 assembly at UCSC and the EF2 assembly at Ensembl are 
*almost* the same build.

Ensembl's site says that the EF2 genome is from March 2010:
http://apr2011.archive.ensembl.org/Saccharomyces_cerevisiae/Info/Index?db=core

The UCSC sacCer2 genome "is based on sequence dated June 2008 in the 
Saccharomyces Genome Database (SGD)":
http://genome.ucsc.edu/cgi-bin/hgGateway?&db=sacCer2

However, there are some tiny differences between the two genome builds. 
  One of our engineers summed it up:
---
The Ensembl EF2 sequence is different from UCSC sacCer2 sequence
in six bases on two chromosomes. UCSC has one extra T in chrX
and one more C and T, and three less G on chrXIV

EF2 has:
#seq len A C G T
X 745741 231168 142294 143873 228406
XIV 784334 241562 151655 151389 239728

UCSC sacCer2 has these two chromosomes as:
#seq len A C G T
chrX 745742 231168 142294 143873 228407
chrXIV 784333 241562 151656 151386 239729
---

So, sacCer2 and EF2 are slightly different.  The gene coordinates in 
UCSC's Ensembl Genes track are downloaded directly from Ensembl.  Since 
they are given in Ensembl's EF2 coordinates, some of the annotations on 
chromosomes 10 and 14 are off by one base when they are displayed on the 
sacCer2 genome browser.

You can see some examples of the problem on chromosomes 10 and 14 by 
turning on both the "SGD Genes" (created from data downloaded from SGD 
on January 30, 2009; see: 
http://genome.ucsc.edu/cgi-bin/hgGene?hgg_do_kgMethod=1) and "Ensembl 
Genes" (updated with each Ensembl update -- currently on version 62) 
tracks in the Genome Browser.  For instance:

SOR1
chrX:736035-737108 sgdGene
chrX:736034-737107 ensGene

PAU6
chrXIV:781918-782280 sgdGene
chrXIV:781919-782281 ensGene

I hope this helps you decide what data to work with.  If you have 
further questions for us, please feel free to write back to 
[email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group



On 06/30/11 11:37, mali salmon wrote:
> Thanks Brooke
> So I just want to be sure that I use the same build for the sequences and
> the annotation. I use sacCer2 genome I downloaded from your ftp site, and a
> gtf file of annotated ensembl genes from the table browser (for sacCer2). Is
> this OK? do the genome and the annotation are of the same build?
> What confuses me is that there were no difference in the genomic locations
> for ensembl genes downloaded from the table browser (sacCer2 June 2008), and
> those for EF2 from ensembl site. I thought that sacCer2 and EF2 are two
> different builds of the genome. Am I wrong?
> Thanks for your help
> Mali
> 
> On Thu, Jun 30, 2011 at 7:25 PM, Brooke Rhead <[email protected]> wrote:
> 
>> Hello Mali,
>>
>> I see what you are talking about now.  Thank you for clarifying.
>>
>> I confirmed with our engineers that the "(lifted to sacCer2 from Ensembl
>> version EF 2)" comment was only applicable to version 59 of Ensembl Genes on
>> sacCer2.  Versions 60, 61, and 62 are identical to Ensembl. The comment was
>> incorrect and has been removed.
>>
>> Thank you for alerting us to this error, and sorry for the confusion!
>>
>>
>> --
>> Brooke Rhead
>> UCSC Genome Bioinformatics Group
>>
>>
>> On 06/29/11 22:01, mali salmon wrote:
>>
>>> Dear Brooke
>>> Thanks for your reply. I'm not on a mirror site, but using the main UCSC
>>> site.
>>> In order to download the gtf file I went to the table browser, and chose
>>> the
>>> following:
>>> genome: S.cerevisiae
>>> assembly: June 2008, SGD/sacCer2
>>> group: Ensembl genes
>>> track: ensGene
>>> output format: GTF
>>> When I click on the "Describe table schema" link I see:
>>> "Schema for Ensembl Genes - Ensembl Genes *(lifted to sacCer2 from Ensembl
>>> version EF 2*)"
>>> Mali
>>>
>>>
>>> On Wed, Jun 29, 2011 at 11:00 PM, Brooke Rhead <[email protected]>
>>> wrote:
>>>
>>>  Hi Mali,
>>>> Can you be more specific about how you downloaded the file from the Table
>>>> Browser?  I only see regular (not lifed) version 62 Ensembl genes on the
>>>> UCSC sacCer2 browser.
>>>>
>>>> Were you by any chance on a mirror site, and not on
>>>> http://genome.ucsc.edu/?
>>>>
>>>> --
>>>> Brooke Rhead
>>>> UCSC Genome Bioinformatics Group
>>>>
>>>>
>>>>
>>>> On 06/29/11 03:15, mali salmon wrote:
>>>>
>>>>  Dear Sir/Madam
>>>>> I have downloaded a gtf file for yeast ensembl genes from the ucsc table
>>>>> browser.
>>>>> According to the description of the table, the locations were "lifted to
>>>>> sacCer2 from Ensembl version EF 2."
>>>>> However, when I compare the locations I get to those from ensembl gtf
>>>>> file
>>>>> I
>>>>> downloaded from ensembl ("Saccharomyces_cerevisiae.****EF2.
>>>>> 62.gtf), I see that there is no difference.
>>>>> How this could be? I suppose there are some changes between the two
>>>>> builds
>>>>> Looking forward to your reply
>>>>> Thanks
>>>>> Mali
>>>>> ______________________________****_________________
>>>>> Genome maillist  -  [email protected]
>>>>> https://lists.soe.ucsc.edu/****mailman/listinfo/genome<https://lists.soe.ucsc.edu/**mailman/listinfo/genome>
>>>>> <https:**//lists.soe.ucsc.edu/mailman/**listinfo/genome<https://lists.soe.ucsc.edu/mailman/listinfo/genome>
>>>>>
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to