Hey, Mary,
Just notice there are also some letters after an "Z", such as >NM_000152_echTel1_14_19 50 0 2 scaffold_298195:1056-1143+ PQEPYRFGEQAQSAMRKAL-LRYALLPZL--------------------- Any thoughts? Thanks Sean On 7/6/11 4:16 PM, Mary Goldman wrote: Hi Sean, Codons with an N in any position are represented with an X (stop codons are represented with a Z). Assemblies that are not well sequenced, such as the Gorilla (gorGor1) will have quite a few Ns (which are bases with low quality scores) and, thus, quite a few Xs in the protein alignment file. You can confirm this by viewing the gorGor1 assembly on our test browser here: http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. Please note that tracks and data on the test server have not undergone formal quality assurance. More information about this file format can be found here: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA. I hope this information is helpful. Please contact us again at [email protected] if you have any further questions. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 7/6/11 3:21 PM, Xiang Li wrote: Hi, Dear Support, It would be easy to understand if they are at the end of a protein sequence. However, could you please help me understand why there are so many "X"es inside some sequences? http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/re fGene.exonAA.fa.gz NM_000152_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK -- NM_001079803_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK -- NM_001079804_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+ NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK There are more than 30,000 sequences with X like that. Please help. Thanks! Sean Sean (Xiang) Li, Ph.D Bioinformatics Scientist Ambry Genetics [email protected] <mailto:[email protected]> <mailto:[email protected]> Direct 949-900-5504 Fax 949-900-5501 _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
