Good Afternoon Peng:

There are many names for genes from a variety of organizations.
Please note a variety of "knownTo..." tables in the genome browser
database.  Each table maps the UCSC gene name to some other organization
gene name.  Pick the organization for your desired gene name set.
If you want to add them all together, perform a "join" of the two
tables.  MySQL is very poor performing a multiple join of many tables.
Here is an example script that can join five tables efficiently:

hgsql -N -e 'SELECT name from knownGene;' hg19 | sort > hg19.knownGene.name.txt
for T in knownToRefSeq knownToPfam knownToEnsembl knownToHInv
do
     hgsql -N -e "SELECT name,value from ${T};" hg19 | sort > hg19.${T}.txt
done

join hg19.knownGene.name.txt hg19.knownToEnsembl.txt \
         | join - hg19.knownToHInv.txt \
         | join - hg19.knownToPfam.txt \
         | join - hg19.knownToRefSeq.txt

In this example, only genes that have names in all these tables will
survive the multiple join.

Your choice on the specific knownTo... tables for other naming schemes.

--Hiram

Peng Yu wrote:
> Hi,
> 
> I know that I can map between knownGene and geneName using the tables
> refFlat and knownToRefSeq. But apparently this mapping is limited to
> the ones have refSeq mapping. So potentially some knownGenes to
> geneName mapping may be lost due to this limit. Could you please let
> me know what is the best way to map between knownGene and geneName?

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to