Hi Sean, Yes, you can get the entire genome alignment here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/maf/. Beware, the compressed data size of these files is 31 Gb and uncompressed is more than 250 Gb. For a description of multiple alignment format (MAF), see http://genome.ucsc.edu/goldenPath/help/maf.html.
Also, in response to your other email, stop codons are represented with a Z. Information about this file format, including non-protein characters can be found here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA. I hope this information is helpful. Please contact us again at [email protected] if you have any further questions. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 7/6/11 4:38 PM, Xiang Li wrote: > > Hi, Mary, > > I got it. Thanks a lot! > > BTW, from the webpage you pointed to me, it seems there are multiple > alignments at the DNA level between entire genomes, i.e., not only > just for CDS regions, but also for entire exon and intronic regions. > > Is my understanding correct? If so, could you please instruct me how > to get that MAFs? > > Thanks! > > Sean > > *From:*Mary Goldman [mailto:[email protected]] > *Sent:* Wednesday, July 06, 2011 4:30 PM > *To:* Xiang Li > *Cc:* [email protected] > *Subject:* Re: [Genome] [help] Lots of stop codons in multiz46way > protein alignment file > > Hi Sean, > > You can also view the Gorilla browser at our preview site here: > http://genome-preview.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. It > tends to be more reliably available than our test site. Our preview > site carries the same warning that tracks and data on the test server > have not undergone formal quality assurance. > > Best, > Mary > --------------------- > Mary Goldman > UCSC Bioinformatics Group > > On 7/6/11 4:16 PM, Mary Goldman wrote: > > Hi Sean, > > Codons with an N in any position are represented with an X (stop > codons are represented with a Z). Assemblies that are not well > sequenced, such as the Gorilla (gorGor1) will have quite a few Ns > (which are bases with low quality scores) and, thus, quite a few Xs in > the protein alignment file. You can confirm this by viewing the > gorGor1 assembly on our test browser here: > http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. Please > note that tracks and data on the test server have not undergone formal > quality assurance. > > More information about this file format can be found here: > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA. > > I hope this information is helpful. Please contact us again at > [email protected] <mailto:[email protected]> if you have any > further questions. > > Best, > Mary > ------------------ > Mary Goldman > UCSC Bioinformatics Group > > > > On 7/6/11 3:21 PM, Xiang Li wrote: > > Hi, Dear Support, > > > > It would be easy to understand if they are at the end of a protein > sequence. However, could you please help me understand why there are so > many "X"es inside some sequences? > > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/re > fGene.exonAA.fa.gz > > > > > NM_000152_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ > > NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK > > -- > > > NM_001079803_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ > > NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK > > -- > > > NM_001079804_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ > > NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK > > > > > > There are more than 30,000 sequences with X like that. Please help. > Thanks! > > > > Sean > > > > Sean (Xiang) Li, Ph.D > > Bioinformatics Scientist > > Ambry Genetics > > [email protected] <mailto:[email protected]> <mailto:[email protected]> > > Direct 949-900-5504 > > Fax 949-900-5501 > > > > _______________________________________________ > Genome maillist [email protected] <mailto:[email protected]> > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
