Dear Greg, Brooke, just wanted to thank you for these answers, that's exactly the data I was looking for and I could get it via the public mySQL server. Thanks!
Anton On Thu, Mar 10, 2011 at 12:16 PM, Brooke Rhead <[email protected]> wrote: > Hi Anton, > > The rn4 refGene table, referenced here: > > > inclusive format.) We are updating the rn4 table now. I expect the >> new version to be on the public MySQL server by next Monday. >> > > has now been updated and is available and is available on the public mysql > server. I get about 18,000 rows using the second method, as opposed to only > about 7,500 rows with the first method. > > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > On 03/08/11 16:12, Brooke Rhead wrote: > >> Hi Anton, >> >> The Table Browser alone is not able to do MySQL joins, so it is not >> possible to do what you are asking within the Table Browser only. Galaxy, >> which works in conjunction with the Table Browser, can do joins (as Greg >> described). >> >> There is another solution, however: we have a public MySQL server that >> you can query directly. Instructions for using it are here: >> >> http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 >> >> Greg came up with a MySQL query equivalent to the Galaxy instructions: >> >> SELECT mm9.refGene.name AS "mm9GeneID", >> mm9.refGene.name2 AS "mm9GeneName", >> rn4.kgXref.geneSymbol AS "rn4GeneName", >> rn4.refGene.name AS "rn4GeneID" >> FROM mm9.refGene, rn4.refGene, rn4.kgXref >> WHERE mm9.refGene.name2 = rn4.kgXref.geneSymbol >> AND rn4.kgXref.refSeq = rn4.refGene.name >> ORDER BY mm9GeneName; >> >> This should get you a table that looks like: >> >> +--------------+---------------+---------------+--------------+ >> | mm9GeneID | mm9GeneName | rn4GeneName | rn4GeneID | >> +--------------+---------------+---------------+--------------+ >> | NM_020003 | 0610031J06Rik | 0610031j06rik | NM_001004226 | >> | NM_001081067 | A1bg | A1bg | NM_022258 | >> | NM_175628 | A2m | A2m | NM_012488 | >> | NM_030210 | Aacs | Aacs | NM_023104 | >> ... >> >> It will have ~7500 rows in it. >> >> I want to point out that you may want a slightly different query, though: >> >> SELECT mm9.refGene.name AS "mm9GeneID", >> mm9.refGene.name2 AS "mm9GeneName", >> rn4.refGene.name2 AS "rn4GeneName", >> rn4.refGene.name AS "rn4GeneID" >> FROM mm9.refGene, rn4.refGene >> WHERE mm9.refGene.name2 = rn4.refGene.name2 >> ORDER BY mm9GeneName; >> >> This one will use the gene symbols from the refGene tables directly, which >> might be preferable, as this data is updated nightly and is not tied to our >> UCSC Genes track, which may not contain everything in the RefSeq Genes >> track. >> >> However, there is currently a problem with this approach: the rn4 refGene >> table is in a different format that does not include the 'name2' field. >> (The rn4 table was initially made before we switched to a more inclusive >> format.) We are updating the rn4 table now. I expect the new version to be >> on the public MySQL server by next Monday. >> >> I hope this is helpful. If you have further questions, please feel free >> to contact us again at [email protected]. >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> On 03/07/11 20:07, Anton Kratz wrote: >> >>> Hi Greg, >>> >>> thanks; I do not wish to use Galaxy though, as I need such a table of >>> mouse >>> RefSeq <-> rat RefSeq as part of an analysis pipeline for comparisons >>> between many such lists, which I have written in Perl and which in turn >>> call >>> other scripts, R, etc... so including a manual step involving Galaxy is >>> not >>> feasible. >>> >>> What I need is a plain, tab-separated list of mouse RefSeqs and their >>> corresping rat RefSeqs, or vice versa. >>> >>> Could you please explain a little more detailed how to get such a list >>> with >>> the table browser? >>> >>> What I tried, in the table browser, I select: >>> clade: Mammal, genome: Mouse, assembly: July 2007 (NCBI37/mm9) >>> group: Genes and Gene Prediction Tracks, track: RefSeq Genes >>> table: kgXref >>> output format: selected fields from primary and related tables >>> >>> Then click on Get Output. >>> >>> On the next page (with areas "Select Fields from mm9.kgXref" and "Linked >>> Tables"), is where I am stuck. >>> >>> Anton >>> >>> On Tue, Mar 8, 2011 at 8:27 AM, Greg Roe <[email protected]> wrote: >>> >>> Hi Anton, >>>> >>>> Yes, you can use the table browser (with the refSeq > kgXref table) to >>>> get >>>> the gene symbols for both lists and compare. Though, Galaxy makes this >>>> a >>>> bit more automated (http://main.g2.bx.psu.edu/). >>>> >>>> Go to Galaxy and click Get Data > UCSC Main< >>>> http://genome.ucsc.edu/cgi-bin/hgTables?GALAXY_URL=http%3A//main.g2.bx.psu.edu/tool_runner&tool_id=ucsc_table_direct1&hgta_compressType=none&sendToGalaxy=1&hgta_outputType=bed>table >>>> browser (upper left). Select the genome, assembly, etc, for rat and >>>> use refSeq with the kgXref Table. It will show you UCSC's table browser >>>> in >>>> the process, and paste your list of gene identifiers in. The "send >>>> output to >>>> Galaxy" box should be checked by default. When you click output it will >>>> allow you to send the output to Galaxy. Then do the same for mouse. >>>> Once >>>> both data sets are there, you can use Galaxy's "Join, Subtract and >>>> Group" >>>> tool to compare the data sets ("Compare Two Data sets" tool). Just join >>>> them >>>> on the geneSymbol column. Your output should then be a list of genes >>>> they >>>> have in common. >>>> >>>> Let me know if you have any additional questions. >>>> >>>> - >>>> Greg Roe >>>> UCSC Genome Bioinformatics Group >>>> >>>> >>>> >>>> On 3/7/11 2:25 AM, Anton Kratz wrote: >>>> >>>> Dear UCSC team, >>>> >>>> I have a list of some RefSeq-genes in mouse, and a list of some >>>> RefSeq-genes >>>> in rat. >>>> >>>> I want to find out which genes are present in both lists. But I can not >>>> just >>>> compare the identifiers, as the genes have different IDs in the two >>>> species. >>>> >>>> For example, Grid2 (glutamate receptor, ionotropic, delta 2) in rat >>>> would >>>> have the identifier NM_024379 and be located at chr4:92642427-94054757, >>>> while in mouse it would be NM_008167 at chr6:63206851-64616273. Still >>>> NM_024379 in rat is the "same" as NM_008167 in mouse. >>>> >>>> What is the best approach to find the same genes in two different lists >>>> of >>>> mouse and rat as described? Is it possible to get a list of equivalent >>>> RefSeq identifiers in mouse and rat throught he table browser? >>>> >>>> Thanks, >>>> Anton >>>> _______________________________________________ >>>> Genome maillist - [email protected] >>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>>> >>>> >>>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
