Hi Anton,

The Table Browser alone is not able to do MySQL joins, so it is not 
possible to do what you are asking within the Table Browser only. 
Galaxy, which works in conjunction with the Table Browser, can do joins 
(as Greg described).

There is another solution, however:  we have a public MySQL server that 
you can query directly.  Instructions for using it are here:

http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29

Greg came up with a MySQL query equivalent to the Galaxy instructions:

SELECT mm9.refGene.name AS "mm9GeneID",
        mm9.refGene.name2 AS "mm9GeneName",
        rn4.kgXref.geneSymbol AS "rn4GeneName",
        rn4.refGene.name AS "rn4GeneID"
   FROM  mm9.refGene, rn4.refGene, rn4.kgXref
   WHERE mm9.refGene.name2 = rn4.kgXref.geneSymbol
      AND   rn4.kgXref.refSeq = rn4.refGene.name
   ORDER BY mm9GeneName;

This should get you a table that looks like:

+--------------+---------------+---------------+--------------+
| mm9GeneID    | mm9GeneName   | rn4GeneName   | rn4GeneID    |
+--------------+---------------+---------------+--------------+
| NM_020003    | 0610031J06Rik | 0610031j06rik | NM_001004226 |
| NM_001081067 | A1bg          | A1bg          | NM_022258    |
| NM_175628    | A2m           | A2m           | NM_012488    |
| NM_030210    | Aacs          | Aacs          | NM_023104    |
...

It will have ~7500 rows in it.

I want to point out that you may want a slightly different query, though:

SELECT mm9.refGene.name AS "mm9GeneID",
        mm9.refGene.name2 AS "mm9GeneName",
        rn4.refGene.name2 AS "rn4GeneName",
        rn4.refGene.name AS "rn4GeneID"
   FROM  mm9.refGene, rn4.refGene
   WHERE mm9.refGene.name2 = rn4.refGene.name2
   ORDER BY mm9GeneName;

This one will use the gene symbols from the refGene tables directly, 
which might be preferable, as this data is updated nightly and is not 
tied to our UCSC Genes track, which may not contain everything in the 
RefSeq Genes track.

However, there is currently a problem with this approach: the rn4 
refGene table is in a different format that does not include the 'name2' 
field.  (The rn4 table was initially made before we switched to a more 
inclusive format.)  We are updating the rn4 table now.  I expect the new 
version to be on the public MySQL server by next Monday.

I hope this is helpful.  If you have further questions, please feel free 
to contact us again at [email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 03/07/11 20:07, Anton Kratz wrote:
> Hi Greg,
> 
> thanks; I do not wish to use Galaxy though, as I need such a table of mouse
> RefSeq <-> rat RefSeq as part of an analysis pipeline for comparisons
> between many such lists, which I have written in Perl and which in turn call
> other scripts, R, etc... so including a manual step involving Galaxy is not
> feasible.
> 
> What I need is a plain, tab-separated list of mouse RefSeqs and their
> corresping rat RefSeqs, or vice versa.
> 
> Could you please explain a little more detailed how to get such a list with
> the table browser?
> 
> What I tried, in the table browser, I select:
> clade: Mammal, genome: Mouse, assembly: July 2007 (NCBI37/mm9)
> group: Genes and Gene Prediction Tracks, track: RefSeq Genes
> table: kgXref
> output format: selected fields from primary and related tables
> 
> Then click on Get Output.
> 
> On the next page (with areas "Select Fields from mm9.kgXref" and "Linked
> Tables"), is where I am stuck.
> 
> Anton
> 
> On Tue, Mar 8, 2011 at 8:27 AM, Greg Roe <[email protected]> wrote:
> 
>>  Hi Anton,
>>
>> Yes, you can use the table browser (with the refSeq > kgXref table) to get
>> the gene symbols for both lists and compare.  Though, Galaxy makes this a
>> bit more automated (http://main.g2.bx.psu.edu/).
>>
>> Go to Galaxy and click Get Data > UCSC 
>> Main<http://genome.ucsc.edu/cgi-bin/hgTables?GALAXY_URL=http%3A//main.g2.bx.psu.edu/tool_runner&tool_id=ucsc_table_direct1&hgta_compressType=none&sendToGalaxy=1&hgta_outputType=bed>table
>>  browser (upper left).  Select the genome, assembly, etc, for rat and
>> use refSeq with the kgXref Table.  It will show you UCSC's table browser in
>> the process, and paste your list of gene identifiers in. The "send output to
>> Galaxy" box should  be checked by default. When you click output it will
>> allow you to send the output to Galaxy. Then do the same for mouse.  Once
>> both data sets are there, you can use Galaxy's "Join, Subtract and Group"
>> tool to compare the data sets ("Compare Two Data sets" tool). Just join them
>> on the geneSymbol column. Your output should then be a list of genes they
>> have in common.
>>
>> Let me know if you have any additional questions.
>>
>> -
>> Greg Roe
>> UCSC Genome Bioinformatics Group
>>
>>
>>
>> On 3/7/11 2:25 AM, Anton Kratz wrote:
>>
>> Dear UCSC team,
>>
>> I have a list of some RefSeq-genes in mouse, and a list of some RefSeq-genes
>> in rat.
>>
>> I want to find out which genes are present in both lists. But I can not just
>> compare the identifiers, as the genes have different IDs in the two species.
>>
>> For example, Grid2 (glutamate receptor, ionotropic, delta 2) in rat would
>> have the identifier NM_024379 and be located at chr4:92642427-94054757,
>> while in mouse it would be NM_008167 at chr6:63206851-64616273. Still
>> NM_024379 in rat is the "same" as NM_008167 in mouse.
>>
>> What is the best approach to find the same genes in two different lists of
>> mouse and rat as described? Is it possible to get a list of equivalent
>> RefSeq identifiers in mouse and rat throught he table browser?
>>
>> Thanks,
>> Anton
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to