On 11 May 2006, at 17:16, Amir Karger wrote:

I've given up temporarily on biomart, and decided I should get my query
working on martshell first.

My first question: why is it that when I list datasets, I see
hsapiens_gene_ensembl, but not hsapiens_gene_ensembl_structure? Is it
somehow a sub-dataset? How am I supposed to know it exists if list
datasets doesn't show it?


You can't query invisible datasets directly in MartShell (just like you cannot do it
in MartView either) . The available datasets can be found below:

MartShell> list datasets;

agambiae_gene_ensembl
amellifera_gene_ensembl
btaurus_gene_ensembl
celegans_gene_ensembl
cfamiliaris_gene_ensembl
cintestinalis_gene_ensembl
dmelanogaster_gene_ensembl
drerio_gene_ensembl
frubripes_gene_ensembl
ggallus_gene_ensembl
hsapiens_gene_ensembl
mdomestica_gene_ensembl
mmulatta_gene_ensembl
mmusculus_gene_ensembl
ptroglodytes_gene_ensembl
rnorvegicus_gene_ensembl
scerevisiae_gene_ensembl
tnigroviridis_gene_ensembl
xtropicalis_gene_ensembl

(only visible are listed)




I was excited to get results from a query, like this:
MartShell> using hsapiens_gene_ensembl get ensembl_transcript_id where
hgnc_symbol in (BRCA1, BRCA2);
ENST00000357654
ENST00000380152
ENST00000267071

My second question: the BRCA1 gene (ENSG00000012048) has a ton of
transcripts
(http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000012048) but
on the web page, most of them have an NP number, and only one
ENST00000357654, is described as BRCA1. If I want all the exons for ANY
transcript of this gene, do I need to first query the gene ID, then
query all exons based on that gene ID? I thought that giving an HGNC
symbol would return anything associated with the GENE that has that
symbol.

I agree that this is more intuitive but Ensembl maps their entries per
transcript rather than per gene. If you want more details about this mapping
you should contact Ensembl helpdesk ([EMAIL PROTECTED])



When I tried to query based on the one transcript ID I had, it failed:
MartShell> use hsapiens_gene_ensembl_structure get exon_id where
transcript_id in (ENST00000357654);
MartShell> use hsapiens_gene_ensembl_structure get exon_id where
stable_transcript_id in (ENST00000357654);
MartShell> use hsapiens_gene_ensembl_structure get exon_id where
str_transcript_id in (ENST00000357654);
MartShell>

Now, from ensembl.org, it's clear that there are 23 exons with this
transcript id. So my third question is, what am I doing wrong here?


you can't use structure because it is an invisible dataset (see above)

also I can't see exon_id (You can find all available attributes by using "list attributes" command or on linux "get <tab><tab>" BTW, martj query library is a bit behind the perl library and you can't really use placeholder attributes at the moment. We are planning martj upgrade soon.




While I'm at it, what's the difference between transcript_id,
stable_transcript_id, and str_transcript_id (same question for gene IDs)


transcript_stable_id is of the format "ENS...." while transcript id is internal
numeric database id


and how do I know which filters in hsapiens_ensembl_gene_structure match
up with attributes in hsapiens_ensembl_gene?

not sure if I understand this question :)

a.


I'd better stop before I ask too many questions.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626



------------------------------------------------------------------------ -------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------ -------



Reply via email to