Hello, Hani.

In your text file, items 2 and 3 have the gene symbol WASH7P.  Let's use
WASH7P as an example.  If you view these items in the UCSC Genes track in
the Browser, you will see that there are multiple transcript variants of
WASH7P, some that span chr1:14000-19000 and some that span chr1:14000-29000.
As a result, there are multiple entries in our canonical set with the same
gene symbol.  The previously-answered mailing list question at
https://lists.soe.ucsc.edu/pipermail/genome/2005-July/008123.html describes
the process of selecting transcripts as members of our canonical set.  As
you discovered, it isn't a perfect system and it's something we're currently
working on revising.

There is no simple way to filter out the isoforms as you suggest.  It would
probably be easiest to devise a post-output method of scanning and removing
items with duplicate gene symbols.

Please contact us again at [email protected] if you have any further
questions.

---
Steve Heitner
UCSC Genome Bioinformatics Group


-----Original Message-----
From: Hani Choudhry [mailto:[email protected]]
Sent: 31 July 2012 17:39
To: '[email protected]'
Subject: RE: [Genome] Protein coding genes list without isoforms

Hi Steve,
Thanks for your reply.  I tried steps as suggested by you but I am still
getting isofrom in the final gene list (attached). Is any way to filter
isoform and get only unique protein coding gene list?
Regards,
Hani Choudhry



-----Original Message-----
From: Steve Heitner [mailto:[email protected]]
Sent: 31 July 2012 16:55
To: [email protected]; [email protected]
Subject: RE: [Genome] Protein coding genes list without isoforms

Hello, Hani.

Based on what you've described, it sounds like the UCSC Genes knownCanonical
and knownIsoforms tables contain the information you're looking for.  To get
the canonical gene list, perform the following steps:

1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables

2. Select the following options:
Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Prediction Tracks
Track: UCSC Genes
Table: knownCanonical
Region: Select "genome" for the entire genome or specify a position in the
"position" box.  You can also specify multiple loci by clicking the "define
regions" button.
Output format: Select "all fields from selected table" to list every field
from the table in your output.  Select "selected fields from primary and
related tables" to specify which fields should be included in your output.

3. Click the "get output" button

The knownIsoforms table combines all isoforms of a single gene into
"clusters" and lists each cluster along with the isoforms that belong to
that cluster.  Note that when you select this table in the Table Browser, it
displays the contents of the entire genome by default.  You do not have the
option of specifying individual genomic regions.  To display the contents of
this table, simply change the table to knownIsoforms in step 2 above.

Please contact us again at [email protected] if you have any further
questions.

---
Steve Heitner
UCSC Genome Bioinformatics Group

-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of [email protected]
Sent: Tuesday, July 31, 2012 2:01 AM
To: [email protected]
Subject: [Genome] Protein coding genes list without isoforms

Dear UCSC Genome Browser,

I would like to get a list of all RefSeq protein coding genes (name, strand,
chromosomal location, and sequences). I have tried to get them from Table
option but it gave me with different isoforms. Is there any option to remove
all isoforms and have one canonical gene?!  Also, I am wondering if you list
meta gene list (combined all isoforms to have a single meta gene) of protein
coding genes?

Thanks for your attention and help. Look forward to your reply soon.

Regards,
Hani Choudhry


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to