Hi 
Can anyone help out with our problem with getting federated queries working? 
(see emails below)  We would like the ensembl database to appear in the 
datasets section (with our own mart) not in the database section of the mart.
Junjun has not been available to continue replying to our problem.  Hope you 
can help...
Thanks
Jenny + Luca

-----Original Message-----
From: Mead, Jennifer 
Sent: 16 June 2009 16:50
To: Junjun Zhang
Subject: RE: [mart-dev] Federating queries

Hi Junjun,


When we try to restart apache, we get this error message, (but we did not edit 
the conf file it is referring to ourselves), and the mart fails to display at 
all.

Syntax error on line 71 of /home/jenny/biomart/biomart-perl/conf/httpd.conf:
Invalid command 'PerlOptions', perhaps misspelled or defined by a module not 
included in the server configuration

However, using an old version of the httpd conf file, before we made any 
changes to the mart or registry file, we get the 2 databases, where you can 
either have Ensembl or 'trans' datasets, so you cannot query all datasets 
together.  This is the problem we had before.
We have made the changes you described to the mart editor, but we still do not 
get the desired result.  Any ideas?
Thanks, J + L


________________________________________
From: Junjun Zhang [[email protected]]
Sent: 16 June 2009 15:03
To: Mead, Jennifer
Cc: [email protected]
Subject: RE: [mart-dev] Federating queries

Hi Jenny and Luca,

You may want to use the current version (ie, 54) of the Ensembl mart.

In order to federate your own mart with a fully functional Ensembl mart, you 
need to have the following four entries in your registry file together with 
your own one between the <virtualSchema> tags:

<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" 
database="ensembl_mart_54" name="ensembl" displayName="ENSEMBL 54 GENES (SANGER 
UK)" port="5316" schema="ensembl_mart_54" user="anonymous" password="" 
visible="1" default="1" martUser="" includeDatasets="" />

<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" 
database="genomic_features_mart_54" name="genomic_features" 
displayName="ENSEMBL 54 GENOMIC FEATURES (SANGER UK)" port="5316" 
schema="genomic_features_mart_54" user="anonymous" password="" visible="0" 
default="0" martUser="" includeDatasets="" />

<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" 
database="ontology_mart_54" name="ontology" displayName="ENSEMBL 54 ONTOLOGY 
(SANGER UK)" port="5316" schema="ontology_mart_54" user="anonymous" password="" 
visible="0" default="0" martUser="" includeDatasets="" />

<MartDBLocation databaseType="mysql" host="martdb.ensembl.org" 
database="sequence_mart_54" name="sequence" displayName="ENSEMBL 54 SEQUENCE 
(SANGER UK)" port="5316" schema="sequence_mart_54" user="anonymous" password="" 
visible="0" default="0" martUser="" includeDatasets="" />

As for defining Importable/Exportable pair, since Ensembl has already an 
Exportable defined (see the attached screenshot), you will just need to define 
an Importable in your dataset that matches Ensembl's Exportable. The other 
attached screenshot shows how to define the Importable in your dataset.

Let us know how it goes, please feel free to get back to us should you have any 
questions.

Best regards,
Junjun





________________________________
From: Mead, Jennifer [mailto:[email protected]]
Sent: Tuesday, June 16, 2009 9:02 AM
To: Junjun Zhang
Cc: [email protected]
Subject: RE: [mart-dev] Federating queries

Hi Junjun,

We are struggling to get federated queries working.  We want to have our mart 
db called 'trans' as the database, and two federated datasets: 1) ensemble gene 
and 2) our own 'trans-A' dataset.  Both ensembl and trans-A have 
ensembl_gene_id, so we want to link on this.  In trans this field is called 
'ensg_id_1017' in ensembl this is 'ensembl_gene_id', but they correspond to the 
same values.

Where we are so far:

1.       We have added the URL location for the ensembl biomart to the registry 
file as (as Syed suggested)



<MartURLLocation database="genomic_features_mart_53" default="0"

displayName="ENSEMBL 53 GENOMIC FEATURES (SANGER UK)"

host="www.biomart.org" includeDatasets="" martUser=""

name="genomic_features" path="/biomart/martservice" port="80"

serverVirtualSchema="default" visible="0" />

   We added this between the <virtualSchema> tags, so there are 2 blocks of 
text, one for trans (real) one for ensemble (URL virtual) within the 
virtualschema tags.  We're not sure we put this in the right position - should 
both trans and ensemble marts be inside these tags?

2. We added importable/exportable specifiers in mart editor.  But we don't know 
if we did it correctly.
   The specific values are probably wrong.   Should internalname be the ensembl 
field name or the trans field name in our case?  And should it be the trans 
field name for the exportable one, and the ensemble name for the importable 
one?  Also, what is the linkname in this case?  The ensembl field name or the 
trans one?

When we refresh the config file, restart apache etc. in our martview we see 2 
databases, trans and ensembl.  There is no way to query both at the same time, 
only separately as discrete DBs.  The link is not working between the two.

Can you help?
Thanks
Jenny and Luca






From: Junjun Zhang [mailto:[email protected]]
Sent: 29 April 2009 15:30
To: Mead, Jennifer
Subject: Re: [mart-dev] Federating queries

Hi Jennifer,

Just thought the following email I sent to other users may be hopeful to you as 
well.

Please feel free to let us know if you have any further questions.

Regards,

Junjun


______________________________________________

Let me try to give you a brief explanation on how join query is done in BioMart.

Join of different datasets is done through Importable/Exportable pair 
predefined using MartEditor. Importable acts as filter, it points to a filter 
(which you have defined earlier under one of the FilterPages). Similarly, 
Exportable acts as attribute, it points to an attribute. See the screenshot 
below for an example of how Importable/Exportable pair is defined.

Once the pair is defined in MartEditor (Importable in one dataset, Exportable 
in the other. For more detailed instruction, please see the document Christina 
sent to you, page 10?), we then prepare a registry file, which includes both 
datasets. finally we run bin/configure.pl to configure martview with the 
registry settings. It is at this step, all the links (Importable/Exportable 
pairs) are determined and stored.

Use the datasets in the screenshot as the example, now when you select both 
datasets from MartView GUI, choose attributes from both dataset and set filters 
on one or both dataset, then fire the query. Let's say dataset MSD would return 
100 rows and dataset gene_ensembl would return 3000 rows if the query were done 
independently on each dataset. But since there is a link between these two 
datasets, dataset MSD will export all 100 'pdb_id' to dataset gene_ensembl as a 
ID-list to filter (searching for matching IDs) the 3000 rows returned by 
gene_ensembl, after filtering, only rows with matching pdb_ids (intersection 
set) will be joined and reported, hence, join query.

As in your example with multiple datasets to be joined, we can do it by 
chaining the process described above, ie, exporting IDs from dataset 1 to 
dataset 2, exporting the intersection IDs to dataset 3, and on and on till the 
last one. For non-standard IDs, as Christina pointed out you can keep standard 
ID and all of it's synonyms in a dimension table and define a filter on the ID 
synonym column. While filtering on this filter, matching any of the synonyms 
will lead to retrieving the desired row.

As you may know, the current BioMart 0.7 release supports join of two datasets. 
Multiple datasets join will be supported in the next release.

Hope this is useful. Please feel free to write to us should you have any 
questions.

Junjun


[cid:519133713@16062009-1528]




Reply via email to