Hello Yunfei,
Only entries in refGene with annotated 5' UTRs (where txStart is not the 
same as cdsStart) appear in the upstream files. Since not all refGene 
entries have this annotation, the two will not match up perfectly.

You should also keep in mind that refGene is updated daily, while 
upstream1000.fa is updated weekly, so some discrepancies can arise from 
that as well.

I hope this clears things up for you.
Best
Antonio Coelho
UCSC Genome Bioinformatics Group

Li, Yunfei wrote:
> Hello
>
> I tried to generate a file like "upstream1000.fa.gz - Sequences 1000 bases 
> upstream of annotated transcription starts for RefSeq genes with annotated 5' 
> UTRs". By using "refGene.txt" to locate different refGene and sequence file 
> of chromosome "chromFaMasked.tar.gz", I can get a file very similar to 
> "upstream1000.fa", but I found some NM names show in "refGene.txt" do no 
> contain in "upstream1000.fa", such as "NM_001166752,NM_053230....." -- why 
> this would happen?
>
> Would you please give me some instructions on after locating each refgene and 
> cut their sequence from chromosome what criterion you have used to select 
> refGene?
>
> Best,
>
> Yunfei Li
> --------------------------------------------------------------------------------------
> Research Assistant
> Department of Statistics &
> School of Molecular Biosciences
> Biotechnology Life Sciences Building 427
> Washington State University
> Pullman, WA 99164-7520
> Phone: 509-339-5096
> http://www.wsu.edu/~ye_lab/people.html
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>   

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to