Re: [mart-dev] Re: Problems generating BioMart queries

Damian Smedley Fri, 12 May 2006 07:01:40 -0700

On Fri, 12 May 2006, David Withers wrote:

> Arek Kasprzyk wrote:
> > 
> > On 11 May 2006, at 17:14, David Withers wrote:
> > 
> >> Hi Arek,
> >>
> >> I'm trying to run the following query:
> >>
> >> <?xml version="1.0" encoding="UTF-8"?>
> >> <!DOCTYPE Query>
> >> <Query virtualSchemaName="default" count="0">
> >>  <Dataset name="hsapiens_gene_ensembl">
> >>   <Filter name="gene_stable_id" value="ENSG00000100031" />
> >>  </Dataset>
> >>  <Dataset name="hsapiens_genomic_sequence">
> >>   <Attribute name="coding_gene_flank" />
> >>  </Dataset>
> >>  <Dataset name="hsapiens_gene_ensembl_structure">
> >>   <Attribute name="gene_stable_id" />
> >>  </Dataset>
> >>  <Links source="hsapiens_genomic_sequence"
> >> target="hsapiens_gene_ensembl" defaultLink="coding_gene_flank" />
> >>  <Links source="hsapiens_gene_ensembl_structure"
> >> target="hsapiens_gene_ensembl" defaultLink="" />
> >> </Query>
> >>
> >> but I don't know how to work out the link between
> >> hsapiens_gene_ensembl_structure and hsapiens_gene_ensembl.
> >>
> >> Do I need to specify links between all the datasets?
> >>
> > 
> > Hi David,
> > I am afraid so. The current 0.4 implementation for web services
> > and API query handling is rather complicated as it requires people
> > to deal with  visible and invisible datasets, placeholders, links and in
> > more complex cases
> > pretty much the execution path of the query.
> > All of it will be removed in 0.5 which will not require for clients to
> > deal with any of it
> > explicitly and instead will be dealt with by the library on the server
> > side. The query xml
> > will only require to specify visible datasets and their attributes and
> > filters. This is going
> > to be very straightforward.
> > 
> > Since you are developing a client which is supposed to work against 0.5
> > you have two choices
> > here. You can either follow a few examples that we can provide on how to
> > create query.xml in
> > cases involving more than one dataset and test it against the 0.4 server
> > (www.biomart.org/biomart/martservice)
> > or wait for a couple of weeks when we have a new 0.5 test server running
> > with much simpler spec of query.xml
> > (alternatively we can provide you the spec for 0.5 query right now and
> > you can develop against
> > it until it becomes 'testable' in a couple of weeks or so  with the
> > arrival of a new test server so it does not hold back your development)
> > 
> > Please let us know what your preference is
> > a.
> 
> Hi Arek,


Hi David,

Arek is away for a few hours so maybe I can help

> 
> Lots of our users want to use BioMart now so I need to try and get
> Taverna working with 0.4. I can construct query.xml for most of my
> tests, including ones with multiple datasets. My problems so far are:
> 1. linking datasets where none of the importables/exportables match

not quite sure what you mean here - can you give me an example. Guess you 
may mean situations where you need to use sequence dataset attributes from 
within a gene dataset. In this scenario the chain of linking is actually 
(for human example):

hsapiens_gene_ensembl -> hsapiens_gene_ensembl_structure -> 
hsapiens_genomic_sequence

To get these links where there is not a direct link between 2 datasets you 
have to use the Registry->getPath method and cycle thro the each pair of 
datasets getting the link between them and adding it to the query. If this 
is the case I can give you some more code pointers. As arek has probably 
pointed out earlier this sort of logic is moving to the API for 0_5 so if 
a filter/attribute is available in a dataset you can just add it to the 
query and not worry about the linking logic

> 2. sequences - I use the datasetLink value specified by the sequence
> attribute but this gives me an error.
> 

the above example may explain this - are you adding the datasetLink value 
to the structure->sequence dataset link?

> Any examples you can give me on how to create query.xml will be very useful.
> 
> I would also like the 0.5 spec so we can ready for when the 0.5 server
> is running.
> 

a few Query XML example is prob the best thing:

(i) Query involving mouse gene filters and atts linked into a human gene 
query which also includes a placeholder attribute from the structure 
dataset (str_chrom_name from hsapiens_gene_ensembl_structure dataset). 
Note no link definitions are required - the API handles all this now and 
links the bottom dataset into the top one using the defaultLink. You can 
still define a Link using the tags as before if you want to use a 
different link to the default. Also note that the maximum number of 
Dataset tags you should need is 2 as only 2 visible datasets are allowed 
at the moment. All placeholder attributes and filters are now defined from 
within the Dataset.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" count = "0" >
          <Dataset name = "hsapiens_gene_ensembl">
              <Attribute name = "ensembl_gene_id" />
              <Attribute name = "chromosome_name" />       
              <Attribute name = 
"hsapiens_gene_ensembl_structure__str_chrom_name" />     

              <Filter name = "chr_name" value = "1"/>
          </Dataset>

          <Dataset name = "mmusculus_gene_ensembl">
              <Attribute name = "ensembl_gene_id" />
              <Attribute name = "chromosome_name" />
              <Filter name = "chr_name" value = "1"/>
          </Dataset>
</Query>


(ii) same but with some placeholder filters

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" count = "0" >
          <Dataset name = "hsapiens_gene_ensembl">
              <Attribute name = "ensembl_gene_id" />
              <Attribute name = "chromosome_name" />       
              <Attribute name = 
"hsapiens_gene_ensembl_structure__str_chrom_name" />     
              <Filter name = "hsapiens_band_start__band_start" value = 
"p36.33"/>
              <Filter name = "hsapiens_band_end__band_end" value = "q43"/>
              <Filter name = "chr_name" value = "1"/>
          </Dataset>

          <Dataset name = "mmusculus_gene_ensembl">
              <Attribute name = "ensembl_gene_id" />
              <Attribute name = "chromosome_name" />
              <Filter name = "chr_name" value = "1"/>
          </Dataset>
</Query>

(iii) sequence examples

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" count = "0" >
          <Dataset name = "hsapiens_gene_ensembl">
              <Attribute name = 
"hsapiens_gene_ensembl_structure__str_chrom_name" />       
              <Attribute name = "hsapiens_genomic_sequence__peptide" />     

              <Filter name = "chr_name" value = "1"/>
          </Dataset>

          <Dataset name = "mmusculus_gene_ensembl">
              <Attribute name = "ensembl_gene_id" />
              <Attribute name = "chromosome_name" />
              <Filter name = "chr_name" value = "1"/>
          </Dataset>
</Query>


The API will work in exactly the same way as the Query XML. You can just 
getAttributeByName of getFilterByName from the available Dataset 
configTrees and add them to a query. No links are required to be added by 
the client.

The other concept is that of formatters set on the Query. This defaults to 
TSV (tab separated vals) but you can also call Query->formatter('HTML') 
using the API or set the formatter attribute in the Query tag of the XML 
to get formatted output from either the API or martservices

Let me know what else you need - we should have a test server up for you 
soon

Best wishes

Damian


> David.
> 
> -- 
> David Withers
> School of Computer Science, University of Manchester,
> Oxford Road, Manchester, M13 9PL, UK.
> Tel: +44(0)161 275 0145
>

Re: [mart-dev] Re: Problems generating BioMart queries

Reply via email to