Dear Greg, Brooke,

just wanted to thank you for these answers, that's exactly the data I was
looking for and I could get it via the public mySQL server. Thanks!

Anton

On Thu, Mar 10, 2011 at 12:16 PM, Brooke Rhead <[email protected]> wrote:

> Hi Anton,
>
> The rn4 refGene table, referenced here:
>
>
>  inclusive format.) We are updating the rn4 table now. I expect the
>> new version to be on the public MySQL server by next Monday.
>>
>
> has now been updated and is available and is available on the public mysql
> server.  I get about 18,000 rows using the second method, as opposed to only
> about 7,500 rows with the first method.
>
>
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
>
>
> On 03/08/11 16:12, Brooke Rhead wrote:
>
>> Hi Anton,
>>
>> The Table Browser alone is not able to do MySQL joins, so it is not
>> possible to do what you are asking within the Table Browser only. Galaxy,
>> which works in conjunction with the Table Browser, can do joins (as Greg
>> described).
>>
>> There is another solution, however:  we have a public MySQL server that
>> you can query directly.  Instructions for using it are here:
>>
>> http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29
>>
>> Greg came up with a MySQL query equivalent to the Galaxy instructions:
>>
>> SELECT mm9.refGene.name AS "mm9GeneID",
>>        mm9.refGene.name2 AS "mm9GeneName",
>>        rn4.kgXref.geneSymbol AS "rn4GeneName",
>>        rn4.refGene.name AS "rn4GeneID"
>>   FROM  mm9.refGene, rn4.refGene, rn4.kgXref
>>   WHERE mm9.refGene.name2 = rn4.kgXref.geneSymbol
>>      AND   rn4.kgXref.refSeq = rn4.refGene.name
>>   ORDER BY mm9GeneName;
>>
>> This should get you a table that looks like:
>>
>> +--------------+---------------+---------------+--------------+
>> | mm9GeneID    | mm9GeneName   | rn4GeneName   | rn4GeneID    |
>> +--------------+---------------+---------------+--------------+
>> | NM_020003    | 0610031J06Rik | 0610031j06rik | NM_001004226 |
>> | NM_001081067 | A1bg          | A1bg          | NM_022258    |
>> | NM_175628    | A2m           | A2m           | NM_012488    |
>> | NM_030210    | Aacs          | Aacs          | NM_023104    |
>> ...
>>
>> It will have ~7500 rows in it.
>>
>> I want to point out that you may want a slightly different query, though:
>>
>> SELECT mm9.refGene.name AS "mm9GeneID",
>>        mm9.refGene.name2 AS "mm9GeneName",
>>        rn4.refGene.name2 AS "rn4GeneName",
>>        rn4.refGene.name AS "rn4GeneID"
>>   FROM  mm9.refGene, rn4.refGene
>>   WHERE mm9.refGene.name2 = rn4.refGene.name2
>>   ORDER BY mm9GeneName;
>>
>> This one will use the gene symbols from the refGene tables directly, which
>> might be preferable, as this data is updated nightly and is not tied to our
>> UCSC Genes track, which may not contain everything in the RefSeq Genes
>> track.
>>
>> However, there is currently a problem with this approach: the rn4 refGene
>> table is in a different format that does not include the 'name2' field.
>>  (The rn4 table was initially made before we switched to a more inclusive
>> format.)  We are updating the rn4 table now.  I expect the new version to be
>> on the public MySQL server by next Monday.
>>
>> I hope this is helpful.  If you have further questions, please feel free
>> to contact us again at [email protected].
>>
>> --
>> Brooke Rhead
>> UCSC Genome Bioinformatics Group
>>
>>
>> On 03/07/11 20:07, Anton Kratz wrote:
>>
>>> Hi Greg,
>>>
>>> thanks; I do not wish to use Galaxy though, as I need such a table of
>>> mouse
>>> RefSeq <-> rat RefSeq as part of an analysis pipeline for comparisons
>>> between many such lists, which I have written in Perl and which in turn
>>> call
>>> other scripts, R, etc... so including a manual step involving Galaxy is
>>> not
>>> feasible.
>>>
>>> What I need is a plain, tab-separated list of mouse RefSeqs and their
>>> corresping rat RefSeqs, or vice versa.
>>>
>>> Could you please explain a little more detailed how to get such a list
>>> with
>>> the table browser?
>>>
>>> What I tried, in the table browser, I select:
>>> clade: Mammal, genome: Mouse, assembly: July 2007 (NCBI37/mm9)
>>> group: Genes and Gene Prediction Tracks, track: RefSeq Genes
>>> table: kgXref
>>> output format: selected fields from primary and related tables
>>>
>>> Then click on Get Output.
>>>
>>> On the next page (with areas "Select Fields from mm9.kgXref" and "Linked
>>> Tables"), is where I am stuck.
>>>
>>> Anton
>>>
>>> On Tue, Mar 8, 2011 at 8:27 AM, Greg Roe <[email protected]> wrote:
>>>
>>>   Hi Anton,
>>>>
>>>> Yes, you can use the table browser (with the refSeq > kgXref table) to
>>>> get
>>>> the gene symbols for both lists and compare.  Though, Galaxy makes this
>>>> a
>>>> bit more automated (http://main.g2.bx.psu.edu/).
>>>>
>>>> Go to Galaxy and click Get Data > UCSC Main<
>>>> http://genome.ucsc.edu/cgi-bin/hgTables?GALAXY_URL=http%3A//main.g2.bx.psu.edu/tool_runner&tool_id=ucsc_table_direct1&hgta_compressType=none&sendToGalaxy=1&hgta_outputType=bed>table
>>>> browser (upper left).  Select the genome, assembly, etc, for rat and
>>>> use refSeq with the kgXref Table.  It will show you UCSC's table browser
>>>> in
>>>> the process, and paste your list of gene identifiers in. The "send
>>>> output to
>>>> Galaxy" box should  be checked by default. When you click output it will
>>>> allow you to send the output to Galaxy. Then do the same for mouse.
>>>>  Once
>>>> both data sets are there, you can use Galaxy's "Join, Subtract and
>>>> Group"
>>>> tool to compare the data sets ("Compare Two Data sets" tool). Just join
>>>> them
>>>> on the geneSymbol column. Your output should then be a list of genes
>>>> they
>>>> have in common.
>>>>
>>>> Let me know if you have any additional questions.
>>>>
>>>> -
>>>> Greg Roe
>>>> UCSC Genome Bioinformatics Group
>>>>
>>>>
>>>>
>>>> On 3/7/11 2:25 AM, Anton Kratz wrote:
>>>>
>>>> Dear UCSC team,
>>>>
>>>> I have a list of some RefSeq-genes in mouse, and a list of some
>>>> RefSeq-genes
>>>> in rat.
>>>>
>>>> I want to find out which genes are present in both lists. But I can not
>>>> just
>>>> compare the identifiers, as the genes have different IDs in the two
>>>> species.
>>>>
>>>> For example, Grid2 (glutamate receptor, ionotropic, delta 2) in rat
>>>> would
>>>> have the identifier NM_024379 and be located at chr4:92642427-94054757,
>>>> while in mouse it would be NM_008167 at chr6:63206851-64616273. Still
>>>> NM_024379 in rat is the "same" as NM_008167 in mouse.
>>>>
>>>> What is the best approach to find the same genes in two different lists
>>>> of
>>>> mouse and rat as described? Is it possible to get a list of equivalent
>>>> RefSeq identifiers in mouse and rat throught he table browser?
>>>>
>>>> Thanks,
>>>> Anton
>>>> _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>>>
>>>>  _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to