Hey, Mary,

 

Just notice there are also some letters after an "Z", such as

 

>NM_000152_echTel1_14_19 50 0 2 scaffold_298195:1056-1143+

PQEPYRFGEQAQSAMRKAL-LRYALLPZL---------------------

 

Any thoughts? Thanks

 

Sean

 


On 7/6/11 4:16 PM, Mary Goldman wrote: 

Hi Sean,

Codons with an N in any position are represented with an X (stop codons
are represented with a Z). Assemblies that are not well sequenced, such
as the Gorilla (gorGor1) will have quite a few Ns (which are bases with
low quality scores) and, thus, quite a few Xs in the protein alignment
file. You can confirm this by viewing the gorGor1 assembly on our test
browser here:
http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?db=gorGor1. Please note
that tracks and data on the test server have not undergone formal
quality assurance. 

More information about this file format can be found here:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA. 

I hope this information is helpful. Please contact us again at
[email protected] if you have any further questions.

Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group



On 7/6/11 3:21 PM, Xiang Li wrote: 

Hi, Dear Support,
 
 
 
It would be easy to understand if they are at the end of a protein
sequence. However, could you please help me understand why there are so
many "X"es inside some sequences?
 
 
 
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/re
fGene.exonAA.fa.gz
 
 
 

        NM_000152_gorGor1_18_19 51 0 0 Supercontig_0039638:17387-17539+

NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK
 
--
 

        NM_001079803_gorGor1_18_19 51 0 0
Supercontig_0039638:17387-17539+

NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK
 
--
 

        NM_001079804_gorGor1_18_19 51 0 0
Supercontig_0039638:17387-17539+

NXIXNELVXVTSEGAGLQLQKVTVLGVATAPQQVXSNGVPVSNFTYSPDTK
 
 
 
 
 
There are more than 30,000 sequences with X like that.   Please help.
Thanks!
 
 
 
Sean
 
 
 
Sean (Xiang) Li, Ph.D
 
Bioinformatics Scientist
 
Ambry Genetics
 
[email protected] <mailto:[email protected]> <mailto:[email protected]>  
 
Direct 949-900-5504
 
Fax 949-900-5501
 
 
 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to