I've given up temporarily on biomart, and decided I should get my query
working on martshell first.

My first question: why is it that when I list datasets, I see
hsapiens_gene_ensembl, but not hsapiens_gene_ensembl_structure? Is it
somehow a sub-dataset? How am I supposed to know it exists if list
datasets doesn't show it?

I was excited to get results from a query, like this:
MartShell> using hsapiens_gene_ensembl get ensembl_transcript_id where
hgnc_symbol in (BRCA1, BRCA2);
ENST00000357654
ENST00000380152
ENST00000267071

My second question: the BRCA1 gene (ENSG00000012048) has a ton of
transcripts
(http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000012048) but
on the web page, most of them have an NP number, and only one
ENST00000357654, is described as BRCA1. If I want all the exons for ANY
transcript of this gene, do I need to first query the gene ID, then
query all exons based on that gene ID? I thought that giving an HGNC
symbol would return anything associated with the GENE that has that
symbol.

When I tried to query based on the one transcript ID I had, it failed:
MartShell> use hsapiens_gene_ensembl_structure get exon_id where
transcript_id in (ENST00000357654);
MartShell> use hsapiens_gene_ensembl_structure get exon_id where
stable_transcript_id in (ENST00000357654);
MartShell> use hsapiens_gene_ensembl_structure get exon_id where
str_transcript_id in (ENST00000357654);
MartShell>

Now, from ensembl.org, it's clear that there are 23 exons with this
transcript id. So my third question is, what am I doing wrong here?

While I'm at it, what's the difference between transcript_id,
stable_transcript_id, and str_transcript_id (same question for gene IDs)
and how do I know which filters in hsapiens_ensembl_gene_structure match
up with attributes in hsapiens_ensembl_gene?

I'd better stop before I ask too many questions.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626

Reply via email to