Hello Syed,
Thank you very much -- indeed, replacing MartURLLocation with
MartDBLocation did solve the problem completely. So it seems indeed that
the warnings are not propagated over the web service. However, it does
not explain why, when I use the web service and request a flank
sequence, specifying a correct upstream_flank value, I still get a blank
screen, although the same request yields sequences when using the DB
service.
Do you recommend using the BioMart web or database service for a
production server? I had favored the web approach to beat firewalls, and
performance is not too much of an issue.
As for the question about TSS +/- 2kb: I thought that the TSS, the
transcript start position, and the first exon start position were all
the same value (trans-splicing making that picture more complicated). So
my question was whether you might consider developing the option to
specify an upstream and downstream "flank" relative to the transcript
start position. Currently, if I choose to retrieve the flank, I cannot
have that flank extend into the transcript.
Best,
Alexandre
Syed Haider wrote:
Hi Alexandre,
On Fri, 2008-07-25 at 09:05 +0200, Alexandre Gattiker wrote:
Thank you very much, that (mostly) solved it!
I can now retrieve transcript/gene/etc. sequences. But if I select one
of the four Flank options, I get a blank result and a warning in the log:
Use of uninitialized value in concatenation (.) or string at
biomart-perl/lib/BioMart/Web.pm line 2446
Ok, this is probably because the warning message isnt getting propagated
correctly over the webservice. Can you configure your apache using
<MartDBLocation >.... </MartDBLocation> type connection params in your
registry instead of <MartURLLocation>. I sent you these in my first
email yesterday. That will connect you to ensembl public databases
directly instead of going via www.biomart.org. I am hoping this would
resolve the flanks problem.
Running the same query at biomart.org however yields a screen message
and alert box:
Validation Error: Requests for flank sequence must be accompanied by an
upstream_flank or downstream_flank request
<http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default&ATTRIBUTES=mmusculus_gene_ensembl.default.sequences.gene_stable_id|mmusculus_gene_ensembl.default.sequences.str_chrom_name|mmusculus_gene_ensembl.default.sequences.struct_biotype|mmusculus_gene_ensembl.default.sequences.coding_gene_flank&FILTERS=mmusculus_gene_ensembl.default.filters.ensembl_gene_id."ENSMUSG00000055866"&VISIBLEPANEL=resultspanel>
Then, if I do select an upstream_flank, I still get a blank page, while the
query works at biomart.org:
<http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default&ATTRIBUTES=mmusculus_gene_ensembl.default.sequences.gene_stable_id|mmusculus_gene_ensembl.default.sequences.str_chrom_name|mmusculus_gene_ensembl.default.sequences.struct_biotype|mmusculus_gene_ensembl.default.sequences.coding_gene_flank|mmusculus_gene_ensembl.default.sequences.upstream_flank."10"&FILTERS=mmusculus_gene_ensembl.default.filters.ensembl_gene_id."ENSMUSG00000055866"&VISIBLEPANEL=resultspanel>
NB I'm using biomart embedded into Ensembl.
Another issue:
I'm trying to fetch the sequence around the transcription start site
(e.g. -2 kb to +2 kb) for promoter analysis. Is there a way to do that?
You can only retrieve the +/- seqs w.r.t the start of first exon. TSS's
coordinates I guess are not available in ensembl database. cc'ed Glenn
to confirm.
cheers
syed
Best regards
Alexandre
Syed Haider wrote:
an even better solution, just add this to your existing registry:
<MartURLLocation
name = "sequence"
displayName = "Sequence (release 49)"
host = "www.biomart.org"
port = "80"
visible = ""
default = ""
includeDatasets = "mmusculus_genomic_sequence"
martUser = ""
/>
On Thu, 2008-07-24 at 18:22 +0200, Alexandre Gattiker wrote:
Hello,
Kudos for this great piece of software. I managed to whip up a very
functional biomart by mashing up some lab data with the Ensembl biomart,
almost accidentally, as I didn't even expect that to be possible! It's
rare enough that software works even better than advertised and in such
a modular way.
I have a small issue, however. When I go to the Attributes -> Sequences
page, the SEQUENCES section has:
No visible attributes in collection seq_scope_type
No visible attributes in collection upstream
No visible attributes in collection downstream
I assume that's linked to warnings I get running configure.pl:
Setting possible links between datasets
....(scanning) 33% WARNING: Pointer attributes from
mmusculus_genomic_sequence will not be available as
mmusculus_genomic_sequence not in registry
WARNING: Pointer attributes from mmusculus_genomic_sequence will
not be available as mmusculus_genomic_sequence not in registry
WARNING: Pointer attributes from mmusculus_genomic_sequence will
not be available as mmusculus_genomic_sequence not in registry
My config is as follows. I have biomart 0.7.
<MartURLLocation
name = "ensembl"
displayName = "Ensembl Genes (release 49)"
host = "www.biomart.org"
port = "80"
visible = "1"
default = ""
includeDatasets = "mmusculus_gene_ensembl"
martUser = ""
/>
I tried
includeDatasets = "mmusculus_gene_ensembl,mmusculus_genomic_sequence"
but that didn't solve the problem. I also tried to leave includeDatasets
empty but I still get the warning (now for all species).
Best,
Alexandre
--
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================
Richard Holland <[EMAIL PROTECTED]>
--
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================
--
--------------------------------------------------------
Alexandre Gattiker Bioinformatics & Biostatistics Core Facility
EPFL School of Life Sciences / Faculté des Sciences de la vie FSV
http://people.epfl.ch/Alexandre.Gattiker