Hi Sean,

Yes, you can get the entire genome alignment here: 
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/maf/. Beware, 
the compressed data size of these files is 31 Gb and uncompressed is 
more than 250 Gb. For a description of multiple alignment format (MAF), 
see http://genome.ucsc.edu/goldenPath/help/maf.html.

Also, in response to your other email, stop codons are represented with 
a Z. Information about this file format, including non-protein 
characters can be found here: 
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA.

I hope this information is helpful. Please contact us again at 
[email protected] if you have any further questions.

Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group

On 7/6/11 4:38 PM, Xiang Li wrote:
>
> Hi, Mary,
>
> I got it. Thanks a lot!
>
> BTW, from the webpage you pointed to me, it seems there are multiple 
> alignments at the DNA level between entire genomes, i.e., not only 
> just for CDS regions, but also for entire exon and intronic regions.
>
> Is my understanding correct? If so, could you please instruct me how 
> to get that MAFs?
>
> Thanks!
>
> Sean
>
> *From:*Mary Goldman [mailto:[email protected]]
> *Sent:* Wednesday, July 06, 2011 4:30 PM
> *To:* Xiang Li
> *Cc:* [email protected]
> *Subject:* Re: [Genome] [help] Lots of stop codons in multiz46way 
> protein alignment file
>
> Hi Sean,
>
> You can also view the Gorilla browser at our preview site here: 
> http://genome-preview.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. It 
> tends to be more reliably available than our test site. Our preview 
> site carries the same warning that tracks and data on the test server 
> have not undergone formal quality assurance.
>
> Best,
> Mary
> ---------------------
> Mary Goldman
> UCSC Bioinformatics Group
>
> On 7/6/11 4:16 PM, Mary Goldman wrote:
>
> Hi Sean,
>
> Codons with an N in any position are represented with an X (stop 
> codons are represented with a Z). Assemblies that are not well 
> sequenced, such as the Gorilla (gorGor1) will have quite a few Ns 
> (which are bases with low quality scores) and, thus, quite a few Xs in 
> the protein alignment file. You can confirm this by viewing the 
> gorGor1 assembly on our test browser here: 
> http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. Please 
> note that tracks and data on the test server have not undergone formal 
> quality assurance.
>
> More information about this file format can be found here: 
> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA.
>
> I hope this information is helpful. Please contact us again at 
> [email protected] <mailto:[email protected]> if you have any 
> further questions.
>
> Best,
> Mary
> ------------------
> Mary Goldman
> UCSC Bioinformatics Group
>
>
>
> On 7/6/11 3:21 PM, Xiang Li wrote:
>
> Hi, Dear Support,
>   
>   
>   
> It would be easy to understand if they are at the end of a protein
> sequence. However, could you please help me understand why there are so
> many "X"es inside some sequences?
>   
>   
>   
> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/re
> fGene.exonAA.fa.gz
>   
>   
>   
>
>     NM_000152_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+
>
> NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK
>   
> --
>   
>
>     NM_001079803_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+
>
> NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK
>   
> --
>   
>
>     NM_001079804_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+
>
> NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK
>   
>   
>   
>   
>   
> There are more than 30,000 sequences with X like that.   Please help.
> Thanks!
>   
>   
>   
> Sean
>   
>   
>   
> Sean (Xiang) Li, Ph.D
>   
> Bioinformatics Scientist
>   
> Ambry Genetics
>   
> [email protected]  <mailto:[email protected]>  <mailto:[email protected]>  
>   
> Direct 949-900-5504
>   
> Fax 949-900-5501
>   
>   
>   
> _______________________________________________
> Genome maillist  [email protected]  <mailto:[email protected]>
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to