Hi Arnoldo, While we do not provide advice on research direction, one of our engineers had these comments about your tables and tracks of interest:
"Some species don't have refFlat files because they have no native RefSeq mRNAs. The xeno refseq alignments are done using protein translated blat. They have the drawback that they easily align to paralogs and pseudogenes. They also tent to not align UTR very well, and hence are not a good indication of transcription start. TransMap would probably be a better source of genes. They are filtered by synteny and do a much better job of aligning UTRs. The drawback is the gene alignments are restricted to genes from the species with pairwise genomic alignments. I would suggest that using either the Transmap RefSeq or mRNA alignments. Transmap RefSeq would be a cleaner set of data, however mRNAs would be more comprehensive." Best, Mary --------------------- Mary Goldman UCSC Bioinformatics Group On 10/5/10 4:08 PM, Arnoldo Jose Muller-Molina wrote: > Hello Mary, members of the list: > > Well, I would like to do some alignments on the different promoters of > different species. > I am running a phylogenetic foot-printing technique and I want to have > as many genes from different species as possible. > > > From what I could gather from the mailing list, xenoRefFlat consists of > other genes from other species aligned into the organism. This would > give me a larger list of genes because some organisms only have 1000+ > gene annotations in refFlat. Do you think using xenoRefFlat for this > purpose makes sense? > > Regards, > > Arnoldo Muller > > On Tue, 2010-10-05 at 15:51 -0700, Mary Goldman wrote: > >> Hi Arnoldo, >> >> The answer to your question depends on what you are going to use the data >> for. Please keep in mind that the UCSC Genome Browser simply displays data; >> it does not say what is or is not acceptable analysis of this data. >> >> Please feel free to contact the mail list again if you require further >> assistance. >> >> Best, >> Mary >> ------------------ >> Mary Goldman >> UCSC Bioinformatics Group >> >> ----- Original Message ----- >> From: "Arnoldo Jose Muller-Molina"<[email protected]> >> To: [email protected] >> Sent: Tuesday, October 5, 2010 1:31:56 PM GMT -08:00 US/Canada Pacific >> Subject: [Genome] About xenorefflat >> >> Hello! >> >> I would like to extract promoter regions of various sizes from different >> vertebrates. I am aware that you provide the upstreamXXXX.fa.gz files >> but I would like to have them repeatmasked. >> >> I decided to extract my data directly from the chromosomes using >> refFlat.txt files. I have noticed that some organisms have a small >> number of entries. Some organisms like the Panda do not have refFlat.txt >> at all. Would it be safe to approximate promoter regions with >> xenoRefFlat? >> >> Regards, >> >> > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
