Hi Can anyone help out with our problem with getting federated queries working? (see emails below) We would like the ensembl database to appear in the datasets section (with our own mart) not in the database section of the mart. Junjun has not been available to continue replying to our problem. Hope you can help... Thanks Jenny + Luca
-----Original Message----- From: Mead, Jennifer Sent: 16 June 2009 16:50 To: Junjun Zhang Subject: RE: [mart-dev] Federating queries Hi Junjun, When we try to restart apache, we get this error message, (but we did not edit the conf file it is referring to ourselves), and the mart fails to display at all. Syntax error on line 71 of /home/jenny/biomart/biomart-perl/conf/httpd.conf: Invalid command 'PerlOptions', perhaps misspelled or defined by a module not included in the server configuration However, using an old version of the httpd conf file, before we made any changes to the mart or registry file, we get the 2 databases, where you can either have Ensembl or 'trans' datasets, so you cannot query all datasets together. This is the problem we had before. We have made the changes you described to the mart editor, but we still do not get the desired result. Any ideas? Thanks, J + L ________________________________________ From: Junjun Zhang [[email protected]] Sent: 16 June 2009 15:03 To: Mead, Jennifer Cc: [email protected] Subject: RE: [mart-dev] Federating queries Hi Jenny and Luca, You may want to use the current version (ie, 54) of the Ensembl mart. In order to federate your own mart with a fully functional Ensembl mart, you need to have the following four entries in your registry file together with your own one between the <virtualSchema> tags: <MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="ensembl_mart_54" name="ensembl" displayName="ENSEMBL 54 GENES (SANGER UK)" port="5316" schema="ensembl_mart_54" user="anonymous" password="" visible="1" default="1" martUser="" includeDatasets="" /> <MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="genomic_features_mart_54" name="genomic_features" displayName="ENSEMBL 54 GENOMIC FEATURES (SANGER UK)" port="5316" schema="genomic_features_mart_54" user="anonymous" password="" visible="0" default="0" martUser="" includeDatasets="" /> <MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="ontology_mart_54" name="ontology" displayName="ENSEMBL 54 ONTOLOGY (SANGER UK)" port="5316" schema="ontology_mart_54" user="anonymous" password="" visible="0" default="0" martUser="" includeDatasets="" /> <MartDBLocation databaseType="mysql" host="martdb.ensembl.org" database="sequence_mart_54" name="sequence" displayName="ENSEMBL 54 SEQUENCE (SANGER UK)" port="5316" schema="sequence_mart_54" user="anonymous" password="" visible="0" default="0" martUser="" includeDatasets="" /> As for defining Importable/Exportable pair, since Ensembl has already an Exportable defined (see the attached screenshot), you will just need to define an Importable in your dataset that matches Ensembl's Exportable. The other attached screenshot shows how to define the Importable in your dataset. Let us know how it goes, please feel free to get back to us should you have any questions. Best regards, Junjun ________________________________ From: Mead, Jennifer [mailto:[email protected]] Sent: Tuesday, June 16, 2009 9:02 AM To: Junjun Zhang Cc: [email protected] Subject: RE: [mart-dev] Federating queries Hi Junjun, We are struggling to get federated queries working. We want to have our mart db called 'trans' as the database, and two federated datasets: 1) ensemble gene and 2) our own 'trans-A' dataset. Both ensembl and trans-A have ensembl_gene_id, so we want to link on this. In trans this field is called 'ensg_id_1017' in ensembl this is 'ensembl_gene_id', but they correspond to the same values. Where we are so far: 1. We have added the URL location for the ensembl biomart to the registry file as (as Syed suggested) <MartURLLocation database="genomic_features_mart_53" default="0" displayName="ENSEMBL 53 GENOMIC FEATURES (SANGER UK)" host="www.biomart.org" includeDatasets="" martUser="" name="genomic_features" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="0" /> We added this between the <virtualSchema> tags, so there are 2 blocks of text, one for trans (real) one for ensemble (URL virtual) within the virtualschema tags. We're not sure we put this in the right position - should both trans and ensemble marts be inside these tags? 2. We added importable/exportable specifiers in mart editor. But we don't know if we did it correctly. The specific values are probably wrong. Should internalname be the ensembl field name or the trans field name in our case? And should it be the trans field name for the exportable one, and the ensemble name for the importable one? Also, what is the linkname in this case? The ensembl field name or the trans one? When we refresh the config file, restart apache etc. in our martview we see 2 databases, trans and ensembl. There is no way to query both at the same time, only separately as discrete DBs. The link is not working between the two. Can you help? Thanks Jenny and Luca From: Junjun Zhang [mailto:[email protected]] Sent: 29 April 2009 15:30 To: Mead, Jennifer Subject: Re: [mart-dev] Federating queries Hi Jennifer, Just thought the following email I sent to other users may be hopeful to you as well. Please feel free to let us know if you have any further questions. Regards, Junjun ______________________________________________ Let me try to give you a brief explanation on how join query is done in BioMart. Join of different datasets is done through Importable/Exportable pair predefined using MartEditor. Importable acts as filter, it points to a filter (which you have defined earlier under one of the FilterPages). Similarly, Exportable acts as attribute, it points to an attribute. See the screenshot below for an example of how Importable/Exportable pair is defined. Once the pair is defined in MartEditor (Importable in one dataset, Exportable in the other. For more detailed instruction, please see the document Christina sent to you, page 10?), we then prepare a registry file, which includes both datasets. finally we run bin/configure.pl to configure martview with the registry settings. It is at this step, all the links (Importable/Exportable pairs) are determined and stored. Use the datasets in the screenshot as the example, now when you select both datasets from MartView GUI, choose attributes from both dataset and set filters on one or both dataset, then fire the query. Let's say dataset MSD would return 100 rows and dataset gene_ensembl would return 3000 rows if the query were done independently on each dataset. But since there is a link between these two datasets, dataset MSD will export all 100 'pdb_id' to dataset gene_ensembl as a ID-list to filter (searching for matching IDs) the 3000 rows returned by gene_ensembl, after filtering, only rows with matching pdb_ids (intersection set) will be joined and reported, hence, join query. As in your example with multiple datasets to be joined, we can do it by chaining the process described above, ie, exporting IDs from dataset 1 to dataset 2, exporting the intersection IDs to dataset 3, and on and on till the last one. For non-standard IDs, as Christina pointed out you can keep standard ID and all of it's synonyms in a dimension table and define a filter on the ID synonym column. While filtering on this filter, matching any of the synonyms will lead to retrieving the desired row. As you may know, the current BioMart 0.7 release supports join of two datasets. Multiple datasets join will be supported in the next release. Hope this is useful. Please feel free to write to us should you have any questions. Junjun [cid:519133713@16062009-1528]
