Re: [Genome] Wrong exon alignment or wrong scripts?

zhuocheng Hou Fri, 30 Oct 2009 08:04:26 -0700

This is part sequences which exact from alignment file(I had removed all the
--- from the sequences). I translated these CDS sequences with 6 frames and
found stop codon for 3 of them. For NM_002099_gorGor1, this sequence start
at the 3rd position. Can you check these sequences?



>NM_002099_gorGor1 [Only Frame 3 can be translated without stopcodon]
AAATTGTGAGCATATCAGCATGGAGTACCACTGAGGTGGCANTGCACACTTCAACCTCTTCTTCAGTCACAAAGAGTTACATCTCATCACAGACAAATGNTAAGCACAAACGGGACACATATNCAGCCCCTCCTAGAGCTCATGAAGTTTCAGAAATTTCTGTTACAACTGTTTACCCTNCAGAAGAGGATGACGGAGAAACGGGACAACTTGTCCATCGTTTCACTGTACCAGTGATAATACTCATTATTTTGTGTTTGATGGCTGGTGTTATTGGAANGATCCTGTTAATTTCTTACAGTATTCTCCGACTGATAAAGAGACAAGTGATCAATGA
>NM_002099_speTri1[All six frames have stop codon]
ATGTATGAGAGAATAACAGTTGGATTACTATTGTCAGGtttatcctcctgaagagataggcaGAAGAAATCAAATTATACACCCTTTCTCTGAACCAGTGATAATATTAATTATTTTTGCGGTAATGTTTGGCATCATTGGAACAATCCTTTTAATTTCTTTCTGTATCAGGCGACTGGTCAAGAAAAGTCCACCAGTCATAAAACCTGTCTCCTTGGAAGACACAGACTTGCCTTTAAGTTCTGCTGAAATGGGACAGACAGAGAATAACCAAAGA
>NM_002099_cavPor3[All six frames have stop codon]
ATGTACGACAAAATAACAATCGCACTGCTGTTGGCAGAGTCAGCCTACTCTTTCAACTGAAGAAGCTGCCGTGACTCCAGGAGCAAGACAGCAAATTGCCCACATGTTCTCGGAACCAGTGATGATAGCTATTATCTTGGGGGCGATAGCTGTTATTGTTGGAGTCATCCTCTCCTTTGCAGTCTGTATCCGGCTACTGACAAGGAAATCTCCAATTAGCAAGCCACCTCCCTTGGAAGACACAGGCGAACCTTTAAATTCTGTTGAAGTAGTACATACAGAGAAGAGTGATCAATGA
>NM_002099_echTel1[All six frames have stop codon]
AAACAGTGAAACAGATTGACTTCCCTTTCTCAGGACCAGTGACAGCACTCATTATCTTTGGAGTGATGGCTGGTATCATTACCATTATTCTCTTACCCAGTTACTGTATTAGTCGCCTGAGAAAGAGAGGACAGTGATGTACAACCTCCGTTGTGACAGGTACACCTTAAGGTTTTATTGAAGGAAAGAAATCATTGA





On Fri, Oct 30, 2009 at 10:30 AM, Jim Kent <[email protected]> wrote:

> I'm not seeing stop codons in NM_002099.  Did you remember to reverse
> complement since it's on the negative strand?
> There are some cases (42) where there are stop codons because of
> selanocysteine, but it's rare, and NM_002099 is not one of them.
>
>
> On Oct 30, 2009, at 9:53 AM, zhuocheng Hou wrote:
>
>  On Fri, Oct 30, 2009 at 12:52 AM, zhuocheng Hou <[email protected]> wrote:
>>
>>  Hi Everyone,
>>>
>>> I used the awk script which provided by Brian(as follows) to concatenate
>>> all the exon alignments into one file. I am not familar with awk, so I
>>> only
>>> copy scripts to run on the sequence file directly as suggested. I found
>>> some
>>> stranges for the results.
>>>
>>> (1) I found lots of stop codons for the CDS sequences, i.e., NM_002099,
>>> NM_2193, this is the widely existed phenomenon for the exon alignment
>>> file.
>>> I used the refGene.exonnuc.fa file.
>>> (2) I don't know how genome browser group generate the 44way refseq exon
>>> alignment file. I found some duplicates in the sequence file, i.e.,
>>> NM_001320
>>>
>>> Can anyone explain a little about these two questions?
>>>
>>> Thanks,
>>> Zhuocheng
>>>
>>>
>>>
>>>
>>> On Thu, Oct 29, 2009 at 5:34 PM, Brian Raney <[email protected]>
>>> wrote:
>>>
>>>  Hey Zhoucheng,
>>>>
>>>> There are a couple of ways you can get the full CDS for refSeq genes for
>>>> all the species with aligning sequence in the 44way.
>>>>
>>>> If you have a small set of genes you're interested in, the easiest way
>>>> would be to use the table browser.  If you want the full set of genes
>>>> represented in the refSeq set, then you can parse the download file by
>>>> concatenating the exons.  I'll describe both these methods below.
>>>>
>>>> First, the format of the entries in the CDS FASTA data set, and how to
>>>> get
>>>> them out of the table browser, is described here:
>>>> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA
>>>>
>>>> If you're not familiar with using the Table Browser, you can read the
>>>> tutorial here:
>>>> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html
>>>>
>>>> Secondly, if you want the whole CDS from the exon only downloads you can
>>>> just concatenate all the exons for a particular gene together.  I
>>>> include an
>>>> awk script below which does this (WARNING: awk script not validated by
>>>> our
>>>> QA dept. Use at your own risk).
>>>>
>>>> If this doesn't answer your question, feel free to write back to this
>>>> list.
>>>>
>>>> Brian Raney
>>>>
>>>> ---
>>>>
>>>> To run script:
>>>>
>>>> $ zcat refGene.exonAA.fa.gz | awk -f awk.script
>>>>
>>>> where awk.script is a file with the following in it:
>>>>
>>>> />/ {
>>>> geneSpecies=$1;gsub("_[0-9]+_[0-9]+","",geneSpecies);
>>>> species=geneSpecies; gsub(".+_","", species);
>>>> speciesList[species]=1;
>>>> gene=geneSpecies;gsub("_" species,"",gene);
>>>> if (geneBuf[species] != gene)
>>>>  {
>>>>  if (geneBuf[species] != "")
>>>>      print geneBuf[species] "_" species, size[species] "\n"
>>>> sequence[species];
>>>>  geneBuf[species]=gene; sequence[species]=""; size[species]=$2
>>>>  }
>>>> else
>>>>  {size[species] += $2}
>>>> }
>>>>
>>>> /^[A-Z-]/ {sequence[species] = sequence[species] $1}
>>>>
>>>> END {for(ii in speciesList)
>>>>      print geneBuf[ii] "_" ii, size[ii] "\n" sequence[ii];
>>>>  }
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Oct 29, 2009 at 11:27 AM, zhuocheng Hou <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I want to exact CDS region from the 44way_refseq alignment file.
>>>>>
>>>> However,
>>>>
>>>>> this alignment was based on the exon. Do anyone can give some
>>>>>
>>>> information
>>>>
>>>>> for this file about how to link these exons into full CDS?
>>>>>
>>>>> The sequence file like this: NM_001077470_hg18_1_7, what's the meaning
>>>>>
>>>> of
>>>>
>>>>> the _1_7?
>>>>>
>>>>> Thanks
>>>>> Zhuocheng
>>>>> _______________________________________________
>>>>> Genome maillist  -  [email protected]
>>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Zhuocheng Hou, Ph.D.
>>> PRB/NICHD/NIH
>>> Wayne State University School of Medicine
>>> 540 E. Canfield Avenue
>>> Detroit, MI 48201
>>>
>>>
>>
>>
>> --
>> Zhuocheng Hou, Ph.D.
>> PRB/NICHD/NIH
>> Wayne State University School of Medicine
>> 540 E. Canfield Avenue
>> Detroit, MI 48201
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>
>


-- 
Zhuocheng Hou, Ph.D.
PRB/NICHD/NIH
Wayne State University School of Medicine
540 E. Canfield Avenue
Detroit, MI 48201
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Wrong exon alignment or wrong scripts?

Reply via email to