Re: [Genome] Wrong exon alignment or wrong scripts?

Jim Kent Fri, 30 Oct 2009 09:11:51 -0700

Hmm, maybe I didn't understand your question.  I thought you were  
talking about stop codons in the human sequence for the gene.


In other species there will certainly be stop codons in many cases.   
Some of these will be due to sequencing errors, particularly in the 2x  
assemblies.  Some will be due to alignment problems, particularly in  
regions where the "self chains" show a rich local repeat structure,  
which can confound the aligners.  Some will be due to genuine  
evolution.  In this case it does seem to be a rapidly evolving  
region.  It does not seem to be fixed in human even.  Observe the  
numerous non-synonmous codon substitutions in red in the human mRNA  
track,  and the actual dip in conservation over the exons.

On Oct 30, 2009, at 11:02 AM, zhuocheng Hou wrote:

> This is part sequences which exact from alignment file(I had removed  
> all the --- from the sequences). I translated these CDS sequences  
> with 6 frames and found stop codon for 3 of them. For  
> NM_002099_gorGor1, this sequence start at the 3rd position. Can you  
> check these sequences?
>
>
> >NM_002099_gorGor1 [Only Frame 3 can be translated without stopcodon]
> AAATTGTGAGCATATCAGCATGGAGTACCACTGAGGTGGCANTGCACACTTCAACCTCTTCTTCAGTCACAAAGAGTTACATCTCATCACAGACAAATGNTAAGCACAAACGGGACACATATNCAGCCCCTCCTAGAGCTCATGAAGTTTCAGAAATTTCTGTTACAACTGTTTACCCTNCAGAAGAGGATGACGGAGAAACGGGACAACTTGTCCATCGTTTCACTGTACCAGTGATAATACTCATTATTTTGTGTTTGATGGCTGGTGTTATTGGAANGATCCTGTTAATTTCTTACAGTATTCTCCGACTGATAAAGAGACAAGTGATCAATGA
> >NM_002099_speTri1[All six frames have stop codon]
> ATGTATGAGAGAATAACAGTTGGATTACTATTGTCAGGtttatcctcctgaagagataggcaGAAGAAATCAAATTATACACCCTTTCTCTGAACCAGTGATAATATTAATTATTTTTGCGGTAATGTTTGGCATCATTGGAACAATCCTTTTAATTTCTTTCTGTATCAGGCGACTGGTCAAGAAAAGTCCACCAGTCATAAAACCTGTCTCCTTGGAAGACACAGACTTGCCTTTAAGTTCTGCTGAAATGGGACAGACAGAGAATAACCAAAGA
> >NM_002099_cavPor3[All six frames have stop codon]
> ATGTACGACAAAATAACAATCGCACTGCTGTTGGCAGAGTCAGCCTACTCTTTCAACTGAAGAAGCTGCCGTGACTCCAGGAGCAAGACAGCAAATTGCCCACATGTTCTCGGAACCAGTGATGATAGCTATTATCTTGGGGGCGATAGCTGTTATTGTTGGAGTCATCCTCTCCTTTGCAGTCTGTATCCGGCTACTGACAAGGAAATCTCCAATTAGCAAGCCACCTCCCTTGGAAGACACAGGCGAACCTTTAAATTCTGTTGAAGTAGTACATACAGAGAAGAGTGATCAATGA
> >NM_002099_echTel1[All six frames have stop codon]
> AAACAGTGAAACAGATTGACTTCCCTTTCTCAGGACCAGTGACAGCACTCATTATCTTTGGAGTGATGGCTGGTATCATTACCATTATTCTCTTACCCAGTTACTGTATTAGTCGCCTGAGAAAGAGAGGACAGTGATGTACAACCTCCGTTGTGACAGGTACACCTTAAGGTTTTATTGAAGGAAAGAAATCATTGA
>
>
>
>
>
> On Fri, Oct 30, 2009 at 10:30 AM, Jim Kent <[email protected]> wrote:
> I'm not seeing stop codons in NM_002099.  Did you remember to  
> reverse complement since it's on the negative strand?
> There are some cases (42) where there are stop codons because of  
> selanocysteine, but it's rare, and NM_002099 is not one of them.
>
>
> On Oct 30, 2009, at 9:53 AM, zhuocheng Hou wrote:
>
> On Fri, Oct 30, 2009 at 12:52 AM, zhuocheng Hou <[email protected]>  
> wrote:
>
> Hi Everyone,
>
> I used the awk script which provided by Brian(as follows) to  
> concatenate
> all the exon alignments into one file. I am not familar with awk, so  
> I only
> copy scripts to run on the sequence file directly as suggested. I  
> found some
> stranges for the results.
>
> (1) I found lots of stop codons for the CDS sequences, i.e.,  
> NM_002099,
> NM_2193, this is the widely existed phenomenon for the exon  
> alignment file.
> I used the refGene.exonnuc.fa file.
> (2) I don't know how genome browser group generate the 44way refseq  
> exon
> alignment file. I found some duplicates in the sequence file, i.e.,
> NM_001320
>
> Can anyone explain a little about these two questions?
>
> Thanks,
> Zhuocheng
>
>
>
>
> On Thu, Oct 29, 2009 at 5:34 PM, Brian Raney <[email protected]>  
> wrote:
>
> Hey Zhoucheng,
>
> There are a couple of ways you can get the full CDS for refSeq genes  
> for
> all the species with aligning sequence in the 44way.
>
> If you have a small set of genes you're interested in, the easiest way
> would be to use the table browser.  If you want the full set of genes
> represented in the refSeq set, then you can parse the download file by
> concatenating the exons.  I'll describe both these methods below.
>
> First, the format of the entries in the CDS FASTA data set, and how  
> to get
> them out of the table browser, is described here:
> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA
>
> If you're not familiar with using the Table Browser, you can read the
> tutorial here:
> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html
>
> Secondly, if you want the whole CDS from the exon only downloads you  
> can
> just concatenate all the exons for a particular gene together.  I  
> include an
> awk script below which does this (WARNING: awk script not validated  
> by our
> QA dept. Use at your own risk).
>
> If this doesn't answer your question, feel free to write back to this
> list.
>
> Brian Raney
>
> ---
>
> To run script:
>
> $ zcat refGene.exonAA.fa.gz | awk -f awk.script
>
> where awk.script is a file with the following in it:
>
> />/ {
> geneSpecies=$1;gsub("_[0-9]+_[0-9]+","",geneSpecies);
> species=geneSpecies; gsub(".+_","", species);
> speciesList[species]=1;
> gene=geneSpecies;gsub("_" species,"",gene);
> if (geneBuf[species] != gene)
>  {
>  if (geneBuf[species] != "")
>      print geneBuf[species] "_" species, size[species] "\n"
> sequence[species];
>  geneBuf[species]=gene; sequence[species]=""; size[species]=$2
>  }
> else
>  {size[species] += $2}
> }
>
> /^[A-Z-]/ {sequence[species] = sequence[species] $1}
>
> END {for(ii in speciesList)
>      print geneBuf[ii] "_" ii, size[ii] "\n" sequence[ii];
>  }
>
>
>
>
> On Thu, Oct 29, 2009 at 11:27 AM, zhuocheng Hou <[email protected]>  
> wrote:
>
> Hi Everyone,
>
> I want to exact CDS region from the 44way_refseq alignment file.
> However,
> this alignment was based on the exon. Do anyone can give some
> information
> for this file about how to link these exons into full CDS?
>
> The sequence file like this: NM_001077470_hg18_1_7, what's the meaning
> of
> the _1_7?
>
> Thanks
> Zhuocheng
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>
>
>
>
> --
> Zhuocheng Hou, Ph.D.
> PRB/NICHD/NIH
> Wayne State University School of Medicine
> 540 E. Canfield Avenue
> Detroit, MI 48201
>
>
>
>
> -- 
> Zhuocheng Hou, Ph.D.
> PRB/NICHD/NIH
> Wayne State University School of Medicine
> 540 E. Canfield Avenue
> Detroit, MI 48201
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>
>
>
> -- 
> Zhuocheng Hou, Ph.D.
> PRB/NICHD/NIH
> Wayne State University School of Medicine
> 540 E. Canfield Avenue
> Detroit, MI 48201

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Wrong exon alignment or wrong scripts?

Reply via email to