On 12 Mar 2007, at 18:53, Julian Catchen wrote:
That worked like a charm -- I am now downloading my sequence data. The
web services interface is slick and a whole lot cleaner than the
previous URL GET method.
:-)
One detail with regard to the XML query builder: it doesn't add
formatting options to the output, even if you selected them in the web
interface, which might be useful to document a little better.
This is probably even better fixed rather than documented ;) I noticed
the same problem. Formatting options should be automatically added.
That leads me to one last question: how do I specify that I want the
downloaded data to be gzipped?
not sure if this is currently supported but let us look into this.
We'll see what we can do about it
Last last question: is there a published DTD for the query format?
I am afraid not, we tend to rely on people compiling xml through
martview and
the format is really trivial but certainly we could publish that as you
suggest
a.
Thanks very much,
julian
Here is an example query that will produce a FASTA file of cDNA
sequences, in case anyone else was interested:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" Header = "1" formatter = "FASTA"
count = "" softwareVersion = "0.5" >
<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
<Attribute name = "cdna" />
<Attribute name = "str_chrom_name" />
<Attribute name = "gene_stable_id_v" />
<Attribute name = "transcript_stable_id_v" />
<Attribute name = "translation_stable_id_v" />
<Attribute name = "transcript_chrom_strand" />
</Dataset>
</Query>
Arek Kasprzyk wrote:
On 12 Mar 2007, at 18:00, Julian Catchen wrote:
Hi Arek,
Thanks very much for the reply. When I press the XML button from
within martview (looking for cDNA sequences for human) it only gives
me the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" Header = "1" count = ""
softwareVersion = "0.5" >
</Query>
The web interface still delivers the correct data, so I am faily
sure I am asking for the right things, however, I am still unable to
construct an XML query that will give me FASTA-formatted sequence
data.
When I only ask for sequence IDs, such as in the example you posted
in your message below, the XML output works, and I get a proper copy
of the XML query.
Any additional help or documentation would be greatly appreciated--
julian
Hi Julian,
there seems to be a small but annoying bug in the XML dumping. If you
remove a 'biotype' attribute the silly thing resets itself giving you
an 'empty' XML
as in your example above. We'll be dealing with this problem shortly.
Meanwhile if you want to remove biotype you need re-check the
attributes again
to get a correct XML. Annoyingly this seems to only affect this
particular header attribute in sequences, the rest seems to be
working fine
please give us a shout if spot anything else,
a.
Arek Kasprzyk wrote:
On 12 Mar 2007, at 03:48, Julian Catchen wrote:
Hello,
Does anyone have any example XML queries they are using to poll
the Ensembl biomart interface? I have gotten some simple examples
working that pull down lists of ensembl IDs by using examples from
the documentation. However, I can't seem to find any examples of
how to query for FASTA formatted cDNA sequences or translations.
Also, I can't find any documentation of how to request gzipped
data.
Hi Julian,
please go to www.biomart.org/biomart/martview, create your
favourite query using MView and click
XML button. This will give you the exact xml format required for
your web service query. In principle anything
that you can do with MView you should be also able to do with
webservice XML. If not, that we need
to fix it.
In order to invoke a formatter you simply need to add
'formatter="FASTA' to your xml query.
For example the below query will give you peptides from chromosome
22 in FASTA format:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" Header = "1" count = ""
softwareVersion = "0.5" formatter="FASTA" >
<Dataset name = "hsapiens_gene_ensembl" interface =
"default" >
<Attribute name = "peptide" />
<Attribute name = "str_chrom_name" />
<Attribute name = "gene_stable_id" />
<Attribute name = "biotype" />
<Filter name = "chromosome_name" value = "22"/>
</Dataset>
</Query>
you can run this query using the webExample.pl script:
http://cvs.sanger.ac.uk/cgi-bin/viewcvs.cgi/biomart-perl/scripts/
webExample.pl?view=markup
I used to have all of these automated through simple URL GET
queries that no longer seem to work with Ensembl post version 41.
URL GET query still work but the format has changed. We have not
yet documented it properly.
I am cc-ing your email to mart-dev so someone from there will send
you a few examples
hope that helps,
a.
Any pointers to examples or documentation for XML queries would be
appreciated.
Thanks,
julian
--------------------------------------------------------------------
----------- Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
--------------------------------------------------------------------
-----------
--Julian M Catchen
Computer and Information Science |
Institute of Neuroscience | [EMAIL PROTECTED]
University of Oregon |
http://www.cs.uoregon.edu/~catchen/
----------------------------------------------------------------------
--------- Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
----------------------------------------------------------------------
---------
--
Julian M Catchen
Computer and Information Science |
Institute of Neuroscience | [EMAIL PROTECTED]
University of Oregon | http://www.cs.uoregon.edu/~catchen/
------------------------------------------------------------------------
-------
Arek Kasprzyk
EMBL-European Bioinformatics Institute.
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
Tel: +44-(0)1223-494606
Fax: +44-(0)1223-494468
------------------------------------------------------------------------
-------