Hi Carsten,

Unfortunately, there is no way easy way to tell which UCSC Genes are alignments 
of the same RNA. The only way to tell would be to use the kgXref table, which 
links mRNAs to UCSC identifiers. You would then have to look for genes where 
the associated mRNA was also associated with another gene.

FASTA files (located here: 
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/) that have *_hap#_* 
are alternate haplotypes of that chromosome or chromosome section. As it says 
in at the top of the page above, the chr*_random sequences are unplaced 
sequence on those reference chromosomes and chrUn_* sequences are unlocalized 
sequences where the corresponding reference chromosome has not been determined.

I hope this information is helpful. Please feel free to contact the mail list 
again if you require further assistance.

Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group

----- Original Message -----
From: "Carsten Raabe" <[email protected]>
To: [email protected]
Sent: Monday, October 25, 2010 11:37:04 PM GMT -08:00 US/Canada Pacific
Subject: [Genome] further questions

Dear Brooke ,

thanks a lot for your fast reply, I do have one more question to ask. Is 
it possible to identify by gene identifiers gene duplications within the 
dataset of UCSC gene predictions. For instance within the RES-seq data 
collection identical names (NM....) at different genomic locations would 
suggest recent gene duplications. As I became to know in the UCSC 
dataset unique identifiers are ascribed to each alignment. Is it 
possible to identify based names only alignnmemts of the same RNA in 
different positions as they are represented by the UCSC track.

In a different context I would like to know whether the chromosome fastA 
files as there are for instance chr5_h2_hap1.fa.gz and chr5.fa.gz 
contain identical, overlapping fragments of chr5 or do they represent 
different not-overlapping regions of the same chromosome. Furthermore 
does the content of the random fastA files overlap to the chr fastA file 
or are they entirely different.


Thanks a lot in advance,

Carsten



Hi Carsten,

While it is true that the identifiers used in the RefSeq and mRNA tracks 
can have the same identifier mapped to multiple positions on the genome, 
every UCSC Gene has a unique identifier.  For instance, there are 77,614 
genes in the hg19 knownGene table, and 77,614 distinct names (such as 
"uc001aaa.3") in the table.

If you have further questions, please feel free to contact us again at 
[email protected].

-- 
Brooke Rhead
UCSC Genome Bioinformatics Group


On 10/25/10 07:39, Carsten Raabe wrote:
Dear Madame, dear Sir,

I do have a question concerning the Human UCSC gene prediction track. 
RefSeq and GenBank RNAs are aligned to the genome with BLAT, keeping 
only the best alignments for each RNA and discarding alignments of less 
than 98% identity. Imagine a recent gene duplication event, leading to 
identical alignments at two or multiple locations. Are there unique uc 
names given for each and every alignment corresponding to each location 
within the genome or are their identical uc names ascribed to equally 
good alignments of the same RNA in different positions. If the second 
scenario is correct, how to identify gene duplications within the UCSC 
gene prediction set.

Thanks in advance,

Carsten
_______________________________________________
Genome maillist  - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to