On Fri, 12 May 2006, David Withers wrote:
> Arek Kasprzyk wrote:
> >
> > On 11 May 2006, at 17:14, David Withers wrote:
> >
> >> Hi Arek,
> >>
> >> I'm trying to run the following query:
> >>
> >> <?xml version="1.0" encoding="UTF-8"?>
> >> <!DOCTYPE Query>
> >> <Query virtualSchemaName="default" count="0">
> >> <Dataset name="hsapiens_gene_ensembl">
> >> <Filter name="gene_stable_id" value="ENSG00000100031" />
> >> </Dataset>
> >> <Dataset name="hsapiens_genomic_sequence">
> >> <Attribute name="coding_gene_flank" />
> >> </Dataset>
> >> <Dataset name="hsapiens_gene_ensembl_structure">
> >> <Attribute name="gene_stable_id" />
> >> </Dataset>
> >> <Links source="hsapiens_genomic_sequence"
> >> target="hsapiens_gene_ensembl" defaultLink="coding_gene_flank" />
> >> <Links source="hsapiens_gene_ensembl_structure"
> >> target="hsapiens_gene_ensembl" defaultLink="" />
> >> </Query>
> >>
> >> but I don't know how to work out the link between
> >> hsapiens_gene_ensembl_structure and hsapiens_gene_ensembl.
> >>
> >> Do I need to specify links between all the datasets?
> >>
> >
> > Hi David,
> > I am afraid so. The current 0.4 implementation for web services
> > and API query handling is rather complicated as it requires people
> > to deal with visible and invisible datasets, placeholders, links and in
> > more complex cases
> > pretty much the execution path of the query.
> > All of it will be removed in 0.5 which will not require for clients to
> > deal with any of it
> > explicitly and instead will be dealt with by the library on the server
> > side. The query xml
> > will only require to specify visible datasets and their attributes and
> > filters. This is going
> > to be very straightforward.
> >
> > Since you are developing a client which is supposed to work against 0.5
> > you have two choices
> > here. You can either follow a few examples that we can provide on how to
> > create query.xml in
> > cases involving more than one dataset and test it against the 0.4 server
> > (www.biomart.org/biomart/martservice)
> > or wait for a couple of weeks when we have a new 0.5 test server running
> > with much simpler spec of query.xml
> > (alternatively we can provide you the spec for 0.5 query right now and
> > you can develop against
> > it until it becomes 'testable' in a couple of weeks or so with the
> > arrival of a new test server so it does not hold back your development)
> >
> > Please let us know what your preference is
> > a.
>
> Hi Arek,
Hi David,
Arek is away for a few hours so maybe I can help
>
> Lots of our users want to use BioMart now so I need to try and get
> Taverna working with 0.4. I can construct query.xml for most of my
> tests, including ones with multiple datasets. My problems so far are:
> 1. linking datasets where none of the importables/exportables match
not quite sure what you mean here - can you give me an example. Guess you
may mean situations where you need to use sequence dataset attributes from
within a gene dataset. In this scenario the chain of linking is actually
(for human example):
hsapiens_gene_ensembl -> hsapiens_gene_ensembl_structure ->
hsapiens_genomic_sequence
To get these links where there is not a direct link between 2 datasets you
have to use the Registry->getPath method and cycle thro the each pair of
datasets getting the link between them and adding it to the query. If this
is the case I can give you some more code pointers. As arek has probably
pointed out earlier this sort of logic is moving to the API for 0_5 so if
a filter/attribute is available in a dataset you can just add it to the
query and not worry about the linking logic
> 2. sequences - I use the datasetLink value specified by the sequence
> attribute but this gives me an error.
>
the above example may explain this - are you adding the datasetLink value
to the structure->sequence dataset link?
> Any examples you can give me on how to create query.xml will be very useful.
>
> I would also like the 0.5 spec so we can ready for when the 0.5 server
> is running.
>
a few Query XML example is prob the best thing:
(i) Query involving mouse gene filters and atts linked into a human gene
query which also includes a placeholder attribute from the structure
dataset (str_chrom_name from hsapiens_gene_ensembl_structure dataset).
Note no link definitions are required - the API handles all this now and
links the bottom dataset into the top one using the defaultLink. You can
still define a Link using the tags as before if you want to use a
different link to the default. Also note that the maximum number of
Dataset tags you should need is 2 as only 2 visible datasets are allowed
at the moment. All placeholder attributes and filters are now defined from
within the Dataset.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" count = "0" >
<Dataset name = "hsapiens_gene_ensembl">
<Attribute name = "ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<Attribute name =
"hsapiens_gene_ensembl_structure__str_chrom_name" />
<Filter name = "chr_name" value = "1"/>
</Dataset>
<Dataset name = "mmusculus_gene_ensembl">
<Attribute name = "ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<Filter name = "chr_name" value = "1"/>
</Dataset>
</Query>
(ii) same but with some placeholder filters
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" count = "0" >
<Dataset name = "hsapiens_gene_ensembl">
<Attribute name = "ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<Attribute name =
"hsapiens_gene_ensembl_structure__str_chrom_name" />
<Filter name = "hsapiens_band_start__band_start" value =
"p36.33"/>
<Filter name = "hsapiens_band_end__band_end" value = "q43"/>
<Filter name = "chr_name" value = "1"/>
</Dataset>
<Dataset name = "mmusculus_gene_ensembl">
<Attribute name = "ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<Filter name = "chr_name" value = "1"/>
</Dataset>
</Query>
(iii) sequence examples
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" count = "0" >
<Dataset name = "hsapiens_gene_ensembl">
<Attribute name =
"hsapiens_gene_ensembl_structure__str_chrom_name" />
<Attribute name = "hsapiens_genomic_sequence__peptide" />
<Filter name = "chr_name" value = "1"/>
</Dataset>
<Dataset name = "mmusculus_gene_ensembl">
<Attribute name = "ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<Filter name = "chr_name" value = "1"/>
</Dataset>
</Query>
The API will work in exactly the same way as the Query XML. You can just
getAttributeByName of getFilterByName from the available Dataset
configTrees and add them to a query. No links are required to be added by
the client.
The other concept is that of formatters set on the Query. This defaults to
TSV (tab separated vals) but you can also call Query->formatter('HTML')
using the API or set the formatter attribute in the Query tag of the XML
to get formatted output from either the API or martservices
Let me know what else you need - we should have a test server up for you
soon
Best wishes
Damian
> David.
>
> --
> David Withers
> School of Computer Science, University of Manchester,
> Oxford Road, Manchester, M13 9PL, UK.
> Tel: +44(0)161 275 0145
>