Dear all,
if I issue this query:
<Dataset name = "rnorvegicus_gene_ensembl" interface = "default" >
<Filter name = "upstream_flank" value = "800"/>
<Filter name = "with_entrezgene" excluded = "0"/>
<Filter name = "transcript_status" value = "KNOWN"/>
<Attribute name = "gene_stable_id" />
<Attribute name = "5utr_start" />
<Attribute name = "5utr_end" />
<Attribute name = "5utr" />
</Dataset>
then for some of the genes (namely those without the 5utr_start and
5utr_end) I get "Sequence unavailable" (often) or "No UTR is annotated
for this transcript" (seldom).
When I manually check these genes by asking for "Flank
(gene)"+upstream-800, I do get 800bp of sequence for each of
"unavailable" and "no utr annotated".
So the question is: if I ask for 5'UTR+upstream-800, shouldn't I
always be getting at least 800 bp of upstream sequence, irrespective
of the absence/presence of the 5'UTR?
If not - then why? :)
Thanks in advance for explanations,
--
Sincerely yours,
Bogdan Tokovenko,
PhD student at the Laboratory of Protein Biosynthesis,
Department of Genetic Information Translation Mechanisms,
Institute of Molecular Biology and Genetics, Kyiv, Ukraine
http://bogdan.org.ua/