This is part sequences which exact from alignment file(I had removed all the --- from the sequences). I translated these CDS sequences with 6 frames and found stop codon for 3 of them. For NM_002099_gorGor1, this sequence start at the 3rd position. Can you check these sequences?
>NM_002099_gorGor1 [Only Frame 3 can be translated without stopcodon] AAATTGTGAGCATATCAGCATGGAGTACCACTGAGGTGGCANTGCACACTTCAACCTCTTCTTCAGTCACAAAGAGTTACATCTCATCACAGACAAATGNTAAGCACAAACGGGACACATATNCAGCCCCTCCTAGAGCTCATGAAGTTTCAGAAATTTCTGTTACAACTGTTTACCCTNCAGAAGAGGATGACGGAGAAACGGGACAACTTGTCCATCGTTTCACTGTACCAGTGATAATACTCATTATTTTGTGTTTGATGGCTGGTGTTATTGGAANGATCCTGTTAATTTCTTACAGTATTCTCCGACTGATAAAGAGACAAGTGATCAATGA >NM_002099_speTri1[All six frames have stop codon] ATGTATGAGAGAATAACAGTTGGATTACTATTGTCAGGtttatcctcctgaagagataggcaGAAGAAATCAAATTATACACCCTTTCTCTGAACCAGTGATAATATTAATTATTTTTGCGGTAATGTTTGGCATCATTGGAACAATCCTTTTAATTTCTTTCTGTATCAGGCGACTGGTCAAGAAAAGTCCACCAGTCATAAAACCTGTCTCCTTGGAAGACACAGACTTGCCTTTAAGTTCTGCTGAAATGGGACAGACAGAGAATAACCAAAGA >NM_002099_cavPor3[All six frames have stop codon] ATGTACGACAAAATAACAATCGCACTGCTGTTGGCAGAGTCAGCCTACTCTTTCAACTGAAGAAGCTGCCGTGACTCCAGGAGCAAGACAGCAAATTGCCCACATGTTCTCGGAACCAGTGATGATAGCTATTATCTTGGGGGCGATAGCTGTTATTGTTGGAGTCATCCTCTCCTTTGCAGTCTGTATCCGGCTACTGACAAGGAAATCTCCAATTAGCAAGCCACCTCCCTTGGAAGACACAGGCGAACCTTTAAATTCTGTTGAAGTAGTACATACAGAGAAGAGTGATCAATGA >NM_002099_echTel1[All six frames have stop codon] AAACAGTGAAACAGATTGACTTCCCTTTCTCAGGACCAGTGACAGCACTCATTATCTTTGGAGTGATGGCTGGTATCATTACCATTATTCTCTTACCCAGTTACTGTATTAGTCGCCTGAGAAAGAGAGGACAGTGATGTACAACCTCCGTTGTGACAGGTACACCTTAAGGTTTTATTGAAGGAAAGAAATCATTGA On Fri, Oct 30, 2009 at 10:30 AM, Jim Kent <[email protected]> wrote: > I'm not seeing stop codons in NM_002099. Did you remember to reverse > complement since it's on the negative strand? > There are some cases (42) where there are stop codons because of > selanocysteine, but it's rare, and NM_002099 is not one of them. > > > On Oct 30, 2009, at 9:53 AM, zhuocheng Hou wrote: > > On Fri, Oct 30, 2009 at 12:52 AM, zhuocheng Hou <[email protected]> wrote: >> >> Hi Everyone, >>> >>> I used the awk script which provided by Brian(as follows) to concatenate >>> all the exon alignments into one file. I am not familar with awk, so I >>> only >>> copy scripts to run on the sequence file directly as suggested. I found >>> some >>> stranges for the results. >>> >>> (1) I found lots of stop codons for the CDS sequences, i.e., NM_002099, >>> NM_2193, this is the widely existed phenomenon for the exon alignment >>> file. >>> I used the refGene.exonnuc.fa file. >>> (2) I don't know how genome browser group generate the 44way refseq exon >>> alignment file. I found some duplicates in the sequence file, i.e., >>> NM_001320 >>> >>> Can anyone explain a little about these two questions? >>> >>> Thanks, >>> Zhuocheng >>> >>> >>> >>> >>> On Thu, Oct 29, 2009 at 5:34 PM, Brian Raney <[email protected]> >>> wrote: >>> >>> Hey Zhoucheng, >>>> >>>> There are a couple of ways you can get the full CDS for refSeq genes for >>>> all the species with aligning sequence in the 44way. >>>> >>>> If you have a small set of genes you're interested in, the easiest way >>>> would be to use the table browser. If you want the full set of genes >>>> represented in the refSeq set, then you can parse the download file by >>>> concatenating the exons. I'll describe both these methods below. >>>> >>>> First, the format of the entries in the CDS FASTA data set, and how to >>>> get >>>> them out of the table browser, is described here: >>>> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA >>>> >>>> If you're not familiar with using the Table Browser, you can read the >>>> tutorial here: >>>> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html >>>> >>>> Secondly, if you want the whole CDS from the exon only downloads you can >>>> just concatenate all the exons for a particular gene together. I >>>> include an >>>> awk script below which does this (WARNING: awk script not validated by >>>> our >>>> QA dept. Use at your own risk). >>>> >>>> If this doesn't answer your question, feel free to write back to this >>>> list. >>>> >>>> Brian Raney >>>> >>>> --- >>>> >>>> To run script: >>>> >>>> $ zcat refGene.exonAA.fa.gz | awk -f awk.script >>>> >>>> where awk.script is a file with the following in it: >>>> >>>> />/ { >>>> geneSpecies=$1;gsub("_[0-9]+_[0-9]+","",geneSpecies); >>>> species=geneSpecies; gsub(".+_","", species); >>>> speciesList[species]=1; >>>> gene=geneSpecies;gsub("_" species,"",gene); >>>> if (geneBuf[species] != gene) >>>> { >>>> if (geneBuf[species] != "") >>>> print geneBuf[species] "_" species, size[species] "\n" >>>> sequence[species]; >>>> geneBuf[species]=gene; sequence[species]=""; size[species]=$2 >>>> } >>>> else >>>> {size[species] += $2} >>>> } >>>> >>>> /^[A-Z-]/ {sequence[species] = sequence[species] $1} >>>> >>>> END {for(ii in speciesList) >>>> print geneBuf[ii] "_" ii, size[ii] "\n" sequence[ii]; >>>> } >>>> >>>> >>>> >>>> >>>> On Thu, Oct 29, 2009 at 11:27 AM, zhuocheng Hou <[email protected]> >>>> wrote: >>>> >>>>> >>>>> Hi Everyone, >>>>> >>>>> I want to exact CDS region from the 44way_refseq alignment file. >>>>> >>>> However, >>>> >>>>> this alignment was based on the exon. Do anyone can give some >>>>> >>>> information >>>> >>>>> for this file about how to link these exons into full CDS? >>>>> >>>>> The sequence file like this: NM_001077470_hg18_1_7, what's the meaning >>>>> >>>> of >>>> >>>>> the _1_7? >>>>> >>>>> Thanks >>>>> Zhuocheng >>>>> _______________________________________________ >>>>> Genome maillist - [email protected] >>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> Zhuocheng Hou, Ph.D. >>> PRB/NICHD/NIH >>> Wayne State University School of Medicine >>> 540 E. Canfield Avenue >>> Detroit, MI 48201 >>> >>> >> >> >> -- >> Zhuocheng Hou, Ph.D. >> PRB/NICHD/NIH >> Wayne State University School of Medicine >> 540 E. Canfield Avenue >> Detroit, MI 48201 >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > > -- Zhuocheng Hou, Ph.D. PRB/NICHD/NIH Wayne State University School of Medicine 540 E. Canfield Avenue Detroit, MI 48201 _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
