Dear Brooke , thanks a lot for your fast reply, I do have one more question to ask. Is it possible to identify by gene identifiers gene duplications within the dataset of UCSC gene predictions. For instance within the RES-seq data collection identical names (NM....) at different genomic locations would suggest recent gene duplications. As I became to know in the UCSC dataset unique identifiers are ascribed to each alignment. Is it possible to identify based names only alignnmemts of the same RNA in different positions as they are represented by the UCSC track.
In a different context I would like to know whether the chromosome fastA files as there are for instance chr5_h2_hap1.fa.gz and chr5.fa.gz contain identical, overlapping fragments of chr5 or do they represent different not-overlapping regions of the same chromosome. Furthermore does the content of the random fastA files overlap to the chr fastA file or are they entirely different. Thanks a lot in advance, Carsten Hi Carsten, While it is true that the identifiers used in the RefSeq and mRNA tracks can have the same identifier mapped to multiple positions on the genome, every UCSC Gene has a unique identifier. For instance, there are 77,614 genes in the hg19 knownGene table, and 77,614 distinct names (such as "uc001aaa.3") in the table. If you have further questions, please feel free to contact us again at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 10/25/10 07:39, Carsten Raabe wrote: Dear Madame, dear Sir, I do have a question concerning the Human UCSC gene prediction track. RefSeq and GenBank RNAs are aligned to the genome with BLAT, keeping only the best alignments for each RNA and discarding alignments of less than 98% identity. Imagine a recent gene duplication event, leading to identical alignments at two or multiple locations. Are there unique uc names given for each and every alignment corresponding to each location within the genome or are their identical uc names ascribed to equally good alignments of the same RNA in different positions. If the second scenario is correct, how to identify gene duplications within the UCSC gene prediction set. Thanks in advance, Carsten _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
