I'm not seeing stop codons in NM_002099.  Did you remember to reverse  
complement since it's on the negative strand?
There are some cases (42) where there are stop codons because of  
selanocysteine, but it's rare, and NM_002099 is not one of them.

On Oct 30, 2009, at 9:53 AM, zhuocheng Hou wrote:

> On Fri, Oct 30, 2009 at 12:52 AM, zhuocheng Hou <[email protected]>  
> wrote:
>
>> Hi Everyone,
>>
>> I used the awk script which provided by Brian(as follows) to  
>> concatenate
>> all the exon alignments into one file. I am not familar with awk,  
>> so I only
>> copy scripts to run on the sequence file directly as suggested. I  
>> found some
>> stranges for the results.
>>
>> (1) I found lots of stop codons for the CDS sequences, i.e.,  
>> NM_002099,
>> NM_2193, this is the widely existed phenomenon for the exon  
>> alignment file.
>> I used the refGene.exonnuc.fa file.
>> (2) I don't know how genome browser group generate the 44way refseq  
>> exon
>> alignment file. I found some duplicates in the sequence file, i.e.,
>> NM_001320
>>
>> Can anyone explain a little about these two questions?
>>
>> Thanks,
>> Zhuocheng
>>
>>
>>
>>
>> On Thu, Oct 29, 2009 at 5:34 PM, Brian Raney <[email protected]>  
>> wrote:
>>
>>> Hey Zhoucheng,
>>>
>>> There are a couple of ways you can get the full CDS for refSeq  
>>> genes for
>>> all the species with aligning sequence in the 44way.
>>>
>>> If you have a small set of genes you're interested in, the easiest  
>>> way
>>> would be to use the table browser.  If you want the full set of  
>>> genes
>>> represented in the refSeq set, then you can parse the download  
>>> file by
>>> concatenating the exons.  I'll describe both these methods below.
>>>
>>> First, the format of the entries in the CDS FASTA data set, and  
>>> how to get
>>> them out of the table browser, is described here:
>>> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA
>>>
>>> If you're not familiar with using the Table Browser, you can read  
>>> the
>>> tutorial here:
>>> http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html
>>>
>>> Secondly, if you want the whole CDS from the exon only downloads  
>>> you can
>>> just concatenate all the exons for a particular gene together.  I  
>>> include an
>>> awk script below which does this (WARNING: awk script not  
>>> validated by our
>>> QA dept. Use at your own risk).
>>>
>>> If this doesn't answer your question, feel free to write back to  
>>> this
>>> list.
>>>
>>> Brian Raney
>>>
>>> ---
>>>
>>> To run script:
>>>
>>> $ zcat refGene.exonAA.fa.gz | awk -f awk.script
>>>
>>> where awk.script is a file with the following in it:
>>>
>>> />/ {
>>> geneSpecies=$1;gsub("_[0-9]+_[0-9]+","",geneSpecies);
>>> species=geneSpecies; gsub(".+_","", species);
>>> speciesList[species]=1;
>>> gene=geneSpecies;gsub("_" species,"",gene);
>>> if (geneBuf[species] != gene)
>>>   {
>>>   if (geneBuf[species] != "")
>>>       print geneBuf[species] "_" species, size[species] "\n"
>>> sequence[species];
>>>   geneBuf[species]=gene; sequence[species]=""; size[species]=$2
>>>   }
>>> else
>>>   {size[species] += $2}
>>> }
>>>
>>> /^[A-Z-]/ {sequence[species] = sequence[species] $1}
>>>
>>> END {for(ii in speciesList)
>>>       print geneBuf[ii] "_" ii, size[ii] "\n" sequence[ii];
>>>   }
>>>
>>>
>>>
>>>
>>> On Thu, Oct 29, 2009 at 11:27 AM, zhuocheng Hou <[email protected]>  
>>> wrote:
>>>>
>>>> Hi Everyone,
>>>>
>>>> I want to exact CDS region from the 44way_refseq alignment file.
>>> However,
>>>> this alignment was based on the exon. Do anyone can give some
>>> information
>>>> for this file about how to link these exons into full CDS?
>>>>
>>>> The sequence file like this: NM_001077470_hg18_1_7, what's the  
>>>> meaning
>>> of
>>>> the _1_7?
>>>>
>>>> Thanks
>>>> Zhuocheng
>>>> _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
>>>
>>>
>>
>>
>> --
>> Zhuocheng Hou, Ph.D.
>> PRB/NICHD/NIH
>> Wayne State University School of Medicine
>> 540 E. Canfield Avenue
>> Detroit, MI 48201
>>
>
>
>
> -- 
> Zhuocheng Hou, Ph.D.
> PRB/NICHD/NIH
> Wayne State University School of Medicine
> 540 E. Canfield Avenue
> Detroit, MI 48201
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to