Sorry, forgot there is also a third option...

You can contact the owners of the MartURLLocation websites you are pointing at 
to see if they will give you a copy of the registry file that they use 
internally. You can then use the MartDBLocation entries from that registry file 
to put into your own registry, so that your local code can directly access the 
databases and bypass the webservers, thus avoiding the timeout situation.

This only works for those providers who allow direct external access to their 
database servers (I know that Ensembl do, but I'm not sure about others - you 
can connect to the Ensembl ones, which are MySQL databases, using username 
'anonymous', no password, on port 5316 of the server martdb.ensembl.org). 

cheers,
Richard

On 20 Nov 2009, at 20:11, Richard Holland wrote:

> Hello. 
> 
> 500 timeouts are usually caused by big queries that run for a long time. The 
> reason is that there can be a delay during processing these queries that 
> exceeds the maximum allowable response time for the web server you are 
> communicating with via MartURLLocation. 
> 
> There are two solutions:
> 
> a) break your query down into smaller pieces (e.g. each with a smaller set of 
> protein IDs in your ensembl_peptide_id filter) then programatically recombine 
> the results,
> 
> b) install a local mirror of the BioMart databases that you need so that you 
> can configure your registry to use direct database connections instead of 
> using MartURLLocations. Direct database connections are faster and do not 
> suffer from timeouts imposed by web servers.
> 
> cheers,
> Richard
> 
> On 20 Nov 2009, at 19:00, Chris Grassa wrote:
> 
>> Hello,
>> 
>> I have been having some trouble downloading sequence data via BioMart's Perl 
>> API.  I've been trying to obtain sets of coding sequences (maybe on the 
>> order of 35MB or so each), but every time I execute the script, the 
>> following error is returned:
>> 
>> Problems with the web server: 500 read timeout
>> 
>> I seem to be getting exactly 22 sequences every time, instead of the 20,000 
>> or so requested.  I certainly would appreciate any help you may be able to 
>> offer.  I have included the code I am using below, which I mostly copied 
>> from the Perl button on the Martview website.  Below the perl, I have 
>> included the XML contained in the registry file.  Perhaps the data are 
>> available from a host receiving less traffic?
>> 
>> Regards and best wishes,
>> 
>> Chris Grassa
>> 
>> S. Tonia Hsieh Lab
>> University of Florida
>> 
>> 
>> 
>> 
>> #!/usr/bin/perl -w
>> 
>> # An example script demonstrating the use of BioMart API.
>> # This perl API representation is only available for configuration versions 
>> >=  0.5
>> use strict;
>> use BioMart::Initializer;
>> use BioMart::Query;
>> use BioMart::QueryRunner;
>> 
>> my $confFile = 
>> "/home/grassa/src/biomart-perl/conf/ensembl_mart_56_registry.xml";
>> #
>> # NB: change action to 'clean' if you wish to start a fresh configuration
>> # and to 'cached' if you want to skip configuration step on subsequent runs 
>> from the same registry
>> #
>> 
>> my $action='cached';
>> my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 
>> 'action'=>$action);
>> my $registry = $initializer->getRegistry;
>> 
>> my $query = 
>> BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
>> 
>> 
>>      $query->setDataset("btaurus_gene_ensembl");
>>      $query->addFilter("ensembl_peptide_id", [large array of Ensembl protein 
>> IDs]);
>>      $query->addAttribute("ensembl_gene_id");
>>      $query->addAttribute("ensembl_transcript_id");
>>      $query->addAttribute("coding");
>>      $query->addAttribute("ensembl_peptide_id");
>> 
>> $query->formatter("FASTA");
>> 
>> my $query_runner = BioMart::QueryRunner->new();
>> ############################## GET COUNT ############################
>> # $query->count(1);
>> # $query_runner->execute($query);
>> # print $query_runner->getCount();
>> #####################################################################
>> 
>> 
>> ############################## GET RESULTS ##########################
>> # to obtain unique rows only
>> # $query_runner->uniqueRowsOnly(1);
>> 
>> $query_runner->execute($query);
>> $query_runner->printHeader();
>> $query_runner->printResults();
>> $query_runner->printFooter();
>> #####################################################################
>> 
>> 
>> 
>> 
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE MartRegistry>
>> <MartRegistry>
>> <MartURLLocation database="ensembl_mart_56" default="1" displayName="ENSEMBL 
>> 56 GENES (SANGER UK)" host="www.biomart.org" includeDatasets="" martUser="" 
>> name="ensembl" path="/biomart/martservice" port="80" 
>> serverVirtualSchema="default" visible="1" />
>> <MartURLLocation database="snp_mart_56" default="0" displayName="ENSEMBL 56 
>> VARIATION  (SANGER UK)" host="www.biomart.org" includeDatasets="" 
>> martUser="" name="snp" path="/biomart/martservice" port="80" 
>> serverVirtualSchema="default" visible="1" />
>> <MartURLLocation database="functional_genomics_mart_56" default="0" 
>> displayName="ENSEMBL 56 FUNCTIONAL GENOMICS (SANGER UK)" 
>> host="www.biomart.org" includeDatasets="" martUser="" 
>> name="functional_genomics" path="/biomart/martservice" port="80" 
>> serverVirtualSchema="default" visible="1" />
>> <MartURLLocation database="vega_mart_56" default="0" displayName="VEGA 36  
>> (SANGER UK)" host="www.biomart.org" includeDatasets="" martUser="" 
>> name="vega" path="/biomart/martservice" port="80" 
>> serverVirtualSchema="default" visible="1" />
>> <MartURLLocation database="genomic_features_mart_56" default="0" 
>> displayName="ENSEMBL 56 GENOMIC FEATURES (SANGER UK)" host="www.biomart.org" 
>> includeDatasets="" martUser="" name="genomic_features" 
>> path="/biomart/martservice" port="80" serverVirtualSchema="default" 
>> visible="0" />
>> <MartURLLocation database="ontology_mart_56" default="0" 
>> displayName="ENSEMBL 56 ONTOLOGY (SANGER UK)" host="www.biomart.org" 
>> includeDatasets="" martUser="" name="ontology" path="/biomart/martservice" 
>> port="80" serverVirtualSchema="default" visible="0" />
>> <MartURLLocation database="sequence_mart_56" default="0" 
>> displayName="ENSEMBL 56 SEQUENCE (SANGER UK)" host="www.biomart.org" 
>> includeDatasets="" martUser="" name="sequence" path="/biomart/martservice" 
>> port="80" serverVirtualSchema="default" visible="0" />
>> </MartRegistry>
>> 
>> 
>> 
>> 
>> 
>> --
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holl...@eaglegenomics.com
> http://www.eaglegenomics.com/
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holl...@eaglegenomics.com
http://www.eaglegenomics.com/

Reply via email to