Hi Sean, Codons with an N in any position are represented with an X (stop codons are represented with a Z). Assemblies that are not well sequenced, such as the Gorilla (gorGor1) will have quite a few Ns (which are bases with low quality scores) and, thus, quite a few Xs in the protein alignment file. You can confirm this by viewing the gorGor1 assembly on our test browser here: http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. Please note that tracks and data on the test server have not undergone formal quality assurance.
More information about this file format can be found here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA. I hope this information is helpful. Please contact us again at [email protected] <mailto:[email protected]> if you have any further questions. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 7/6/11 3:21 PM, Xiang Li wrote: > Hi, Dear Support, > > > > It would be easy to understand if they are at the end of a protein > sequence. However, could you please help me understand why there are so > many "X"es inside some sequences? > > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/re > fGene.exonAA.fa.gz > > > >> NM_000152_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ > NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK > > -- > >> NM_001079803_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ > NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK > > -- > >> NM_001079804_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ > NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK > > > > > > There are more than 30,000 sequences with X like that. Please help. > Thanks! > > > > Sean > > > > Sean (Xiang) Li, Ph.D > > Bioinformatics Scientist > > Ambry Genetics > > [email protected]<mailto:[email protected]> > > Direct 949-900-5504 > > Fax 949-900-5501 > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
