[Genome] further questions

Carsten Raabe Tue, 26 Oct 2010 13:25:19 -0700

Dear Brooke ,

thanks a lot for your fast reply, I do have one more question to ask. Is 
it possible to identify by gene identifiers gene duplications within the 
dataset of UCSC gene predictions. For instance within the RES-seq data 
collection identical names (NM....) at different genomic locations would 
suggest recent gene duplications. As I became to know in the UCSC 
dataset unique identifiers are ascribed to each alignment. Is it 
possible to identify based names only alignnmemts of the same RNA in 
different positions as they are represented by the UCSC track.


In a different context I would like to know whether the chromosome fastA 
files as there are for instance chr5_h2_hap1.fa.gz and chr5.fa.gz 
contain identical, overlapping fragments of chr5 or do they represent 
different not-overlapping regions of the same chromosome. Furthermore 
does the content of the random fastA files overlap to the chr fastA file 
or are they entirely different.


Thanks a lot in advance,

Carsten



Hi Carsten,

While it is true that the identifiers used in the RefSeq and mRNA tracks 
can have the same identifier mapped to multiple positions on the genome, 
every UCSC Gene has a unique identifier.  For instance, there are 77,614 
genes in the hg19 knownGene table, and 77,614 distinct names (such as 
"uc001aaa.3") in the table.

If you have further questions, please feel free to contact us again at 
[email protected].

-- 
Brooke Rhead
UCSC Genome Bioinformatics Group


On 10/25/10 07:39, Carsten Raabe wrote:
Dear Madame, dear Sir,

I do have a question concerning the Human UCSC gene prediction track. 
RefSeq and GenBank RNAs are aligned to the genome with BLAT, keeping 
only the best alignments for each RNA and discarding alignments of less 
than 98% identity. Imagine a recent gene duplication event, leading to 
identical alignments at two or multiple locations. Are there unique uc 
names given for each and every alignment corresponding to each location 
within the genome or are their identical uc names ascribed to equally 
good alignments of the same RNA in different positions. If the second 
scenario is correct, how to identify gene duplications within the UCSC 
gene prediction set.

Thanks in advance,

Carsten
_______________________________________________
Genome maillist  - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

[Genome] further questions

Reply via email to