Hi, Mary,
Could you please instruct me how to get the entire genome alignment based on homo sapiens hg19? >From the file I downloaded, it seems like hg18. Thanks Sean From: Mary Goldman [mailto:[email protected]] Sent: Thursday, July 07, 2011 11:50 AM To: Xiang Li Cc: [email protected] Subject: Re: [Genome] [help] Lots of stop codons in multiz46way protein alignment file Hi Sean, Yes, you can get the entire genome alignment here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/maf/. Beware, the compressed data size of these files is 31 Gb and uncompressed is more than 250 Gb. For a description of multiple alignment format (MAF), see http://genome.ucsc.edu/goldenPath/help/maf.html. Also, in response to your other email, stop codons are represented with a Z. Information about this file format, including non-protein characters can be found here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA. I hope this information is helpful. Please contact us again at [email protected] if you have any further questions. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 7/6/11 4:38 PM, Xiang Li wrote: Hi, Mary, I got it. Thanks a lot! BTW, from the webpage you pointed to me, it seems there are multiple alignments at the DNA level between entire genomes, i.e., not only just for CDS regions, but also for entire exon and intronic regions. Is my understanding correct? If so, could you please instruct me how to get that MAFs? Thanks! Sean From: Mary Goldman [mailto:[email protected]] Sent: Wednesday, July 06, 2011 4:30 PM To: Xiang Li Cc: [email protected] Subject: Re: [Genome] [help] Lots of stop codons in multiz46way protein alignment file Hi Sean, You can also view the Gorilla browser at our preview site here: http://genome-preview.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. It tends to be more reliably available than our test site. Our preview site carries the same warning that tracks and data on the test server have not undergone formal quality assurance. Best, Mary --------------------- Mary Goldman UCSC Bioinformatics Group On 7/6/11 4:16 PM, Mary Goldman wrote: Hi Sean, Codons with an N in any position are represented with an X (stop codons are represented with a Z). Assemblies that are not well sequenced, such as the Gorilla (gorGor1) will have quite a few Ns (which are bases with low quality scores) and, thus, quite a few Xs in the protein alignment file. You can confirm this by viewing the gorGor1 assembly on our test browser here: http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. Please note that tracks and data on the test server have not undergone formal quality assurance. More information about this file format can be found here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA. I hope this information is helpful. Please contact us again at [email protected] if you have any further questions. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 7/6/11 3:21 PM, Xiang Li wrote: Hi, Dear Support, It would be easy to understand if they are at the end of a protein sequence. However, could you please help me understand why there are so many "X"es inside some sequences? http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/re fGene.exonAA.fa.gz NM_000152_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK -- NM_001079803_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK -- NM_001079804_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK There are more than 30,000 sequences with X like that. Please help. Thanks! Sean Sean (Xiang) Li, Ph.D Bioinformatics Scientist Ambry Genetics [email protected] <mailto:[email protected]> <mailto:[email protected]> Direct 949-900-5504 Fax 949-900-5501 _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
