Hello Kathleen,

Apologies for the delay in replying. One of our developers had this to 
say about your issue:

In general, hgFixed contains data that are not anchored to a genomic 
position -- they may be referenced to a probe ID instead, and the the 
probe ID may map to a different location in hg18, hg19 etc. but the 
expression scores associated with the probe are independent of where it 
maps in a particular genome assembly. hgFixed may contain data that we 
are not able to place on a particular assembly like hg19, and if we 
can't place it, we can't display it in hgTracks so we don't include it 
in our assembly-specific database tables.

Indeed hg19.knownToGnfAtlas2 does have < 20,000 of the >44,000 probe 
names in hgFixed.gnfHumanAtlas2All. And the number of probes from 
gnfHumanAtlas2All that were mapped to hg19 is ~33,000:

mysql> select count(distinct(value)) from hg19.knownToGnfAtlas2;
+------------------------+
| count(distinct(value)) |
+------------------------+
|                  19624 |
+------------------------+

mysql> select count(distinct(name)) from hgFixed.gnfHumanAtlas2All;
+-----------------------+
| count(distinct(name)) |
+-----------------------+
|                 44775 |
+-----------------------+

mysql> select count(distinct(name)) from hg19.gnfAtlas2;
+-----------------------+
| count(distinct(name)) |
+-----------------------+
|                 33186 |
+-----------------------+

The drop from 44,000 to 33,000 is most likely because some probe 
sequences didn't map, or we need locations from chip vendors but didn't 
get them.

That the drop from 33,000 to <20,000 is because several probes may map 
to approximately the same location. For example, hg19 
chr19:41,347,723-41,358,062 has UCSC Gene uc002opl.3, covered by four 
different probes in the gnfAtlas2 table. A single probe is selected for 
knownToGnfAtlas2:

mysql> select chrom,chromStart,chromEnd,name,score from gnfAtlas2 where 
chrom = "chr19" and chromEnd > 41349443 and chromStart < 41356352 limit 10;
+-------+------------+----------+-------------+-------+
| chrom | chromStart | chromEnd | name        | score |
+-------+------------+----------+-------------+-------+
| chr19 |   41349438 | 41356331 | 214320_x_at |   958 |
| chr19 |   41349443 | 41356339 | 207244_x_at |   998 |
| chr19 |   41349446 | 41356339 | 1494_f_at   |   998 |
| chr19 |   41349566 | 41356339 | 211295_x_at |   988 |
+-------+------------+----------+-------------+-------+

mysql> select * from knownToGnfAtlas2 where name = "uc002opl.3";
+------------+-------------+
| name       | value       |
+------------+-------------+
| uc002opl.3 | 207244_x_at |
+------------+-------------+

Hopefully this information was helpful and answers your question. If you 
have further questions or require clarification feel free to contact the 
mailing list at [email protected].

Regards,

Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu

> Dear Luvina and Genome group,
> I do apologize for the repeated emails, but I continue having trouble
> resolving one problem. Of the approximately 44,000 probe IDs for which
> there is expression data in the hgFixed database (table
> hgFixed.gnfHumanAtlas2All), I have only been able to obtain
> corresponding gene symbols for approximately 20,000 using the
> recommendations provided by Luvina in previous email (see below.) If I
> understand correctly, hgFixed and hg19 are separate databases. Thus,
> when I match probes in the hgFixed.gnfHumanAtlas2All table to the
> probes in the hg19.knownToGnfAtlas2 table, I am picking up the gene
> mapping information for only the probes that are common to both
> databases. However, there seem to be approximately 24,000 probes that
> are in the hgFixed database but are not in the hg19 probe data tables
> (I have actually run manual merges on the hgFixed probes with each of
> the hg19.knownToGnfAtlas2, hg19.knownToU133Plus2, hg19.knownToU133 and
> hg19.knownToU95 tables and picked up an additional 900 or so.) There
> does not appear to be any probe mapping tables (i.e., tables with
> chromosomal location and/or genes) that are specific to the hgFixed
> probe set. Is this correct?
> 
> Interestingly, the Gene Annot database
> (http://genecards.weizmann.ac.il/geneannot/index.shtml) does contain
> gene symbol annotation information, supposedly via the U133Plus2.0
> probe set, for many of the probes that do not have gene information in
> the hg19.knownToU133Plus2 table through UCSC Genome. Have there been
> updates to that probe set that have not been incorporated into your
> database? (I'd use Gene Annot to obtain the information for the 24,000
> remaining probes, but the website limits batch size to 400 probes at a
> time and the html output is a bit cumbersome to work with, so would
> prefer a simpler solution if one exists.)
> 
> Thanks again for your valuable assistance.
> Kathleen
> 
> 
> On Fri, Jun 10, 2011 at 6:31 PM, Luvina Guruvadoo <[email protected]> wrote:
>> Hi Kathleen,
>>
>> You will first need to retrieve UCSC IDs from the hg19.knownToGnfAtlas2
>> table, then use hg19.kgXref to retrieve the corresponding gene names. I
>> suggest you do this using the Table Browser. Make the following selections:
>>
>> table: knownToGnfAtlas2
>> output format: selected fields from primary and related tables
>>
>> Click 'get ouput'. On the following page, select the 'name' and 'value'
>> fields, scroll down and select 'hg19 kgXref' and click 'Allow Selection From
>> Checked Tables'. From here, you can select 'geneSymbol' then click 'get
>> output'.
>>
>> I hope this helps. Please contact us again at [email protected] if you
>> have any further questions.
>> ---
>> Luvina Guruvadoo
>> UCSC Genome Bioinformatics Group
>>
>>
>>
>> kathleen askland wrote:
>>> Hello,
>>> I have an additional question pertaining to the GNF Atlas 2 expression
>>> data available through UCSC.
>>> If I download the hgFixed.gnfHumanAtlas2All table, and join the
>>> hg19.kgXref.geneSymbol field from the hg19.kgXref table, will the
>>> matching of probe to gene symbol be correct even though the two tables
>>> are from different databases?
>>> Thank you,
>>> Kathleen
>>>
>>>
>>>
>>
> 
> 
> 

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to