Re: [mart-dev] Perl API - 500 read timeout
isn't it easier to download the coding sequences from the ensembl ftp server and then post-process them on your own machine? but you might have tried that already. cheers Max
Re: [mart-dev] Perl API - 500 read timeout
Sorry, forgot there is also a third option... You can contact the owners of the MartURLLocation websites you are pointing at to see if they will give you a copy of the registry file that they use internally. You can then use the MartDBLocation entries from that registry file to put into your own registry, so that your local code can directly access the databases and bypass the webservers, thus avoiding the timeout situation. This only works for those providers who allow direct external access to their database servers (I know that Ensembl do, but I'm not sure about others - you can connect to the Ensembl ones, which are MySQL databases, using username 'anonymous', no password, on port 5316 of the server martdb.ensembl.org). cheers, Richard On 20 Nov 2009, at 20:11, Richard Holland wrote: > Hello. > > 500 timeouts are usually caused by big queries that run for a long time. The > reason is that there can be a delay during processing these queries that > exceeds the maximum allowable response time for the web server you are > communicating with via MartURLLocation. > > There are two solutions: > > a) break your query down into smaller pieces (e.g. each with a smaller set of > protein IDs in your ensembl_peptide_id filter) then programatically recombine > the results, > > b) install a local mirror of the BioMart databases that you need so that you > can configure your registry to use direct database connections instead of > using MartURLLocations. Direct database connections are faster and do not > suffer from timeouts imposed by web servers. > > cheers, > Richard > > On 20 Nov 2009, at 19:00, Chris Grassa wrote: > >> Hello, >> >> I have been having some trouble downloading sequence data via BioMart's Perl >> API. I've been trying to obtain sets of coding sequences (maybe on the >> order of 35MB or so each), but every time I execute the script, the >> following error is returned: >> >> Problems with the web server: 500 read timeout >> >> I seem to be getting exactly 22 sequences every time, instead of the 20,000 >> or so requested. I certainly would appreciate any help you may be able to >> offer. I have included the code I am using below, which I mostly copied >> from the Perl button on the Martview website. Below the perl, I have >> included the XML contained in the registry file. Perhaps the data are >> available from a host receiving less traffic? >> >> Regards and best wishes, >> >> Chris Grassa >> >> S. Tonia Hsieh Lab >> University of Florida >> >> >> >> >> #!/usr/bin/perl -w >> >> # An example script demonstrating the use of BioMart API. >> # This perl API representation is only available for configuration versions >> >= 0.5 >> use strict; >> use BioMart::Initializer; >> use BioMart::Query; >> use BioMart::QueryRunner; >> >> my $confFile = >> "/home/grassa/src/biomart-perl/conf/ensembl_mart_56_registry.xml"; >> # >> # NB: change action to 'clean' if you wish to start a fresh configuration >> # and to 'cached' if you want to skip configuration step on subsequent runs >> from the same registry >> # >> >> my $action='cached'; >> my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, >> 'action'=>$action); >> my $registry = $initializer->getRegistry; >> >> my $query = >> BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default'); >> >> >> $query->setDataset("btaurus_gene_ensembl"); >> $query->addFilter("ensembl_peptide_id", [large array of Ensembl protein >> IDs]); >> $query->addAttribute("ensembl_gene_id"); >> $query->addAttribute("ensembl_transcript_id"); >> $query->addAttribute("coding"); >> $query->addAttribute("ensembl_peptide_id"); >> >> $query->formatter("FASTA"); >> >> my $query_runner = BioMart::QueryRunner->new(); >> ## GET COUNT >> # $query->count(1); >> # $query_runner->execute($query); >> # print $query_runner->getCount(); >> # >> >> >> ## GET RESULTS ## >> # to obtain unique rows only >> # $query_runner->uniqueRowsOnly(1); >> >> $query_runner->execute($query); >> $query_runner->printHeader(); >> $query_runner->printResults(); >> $query_runner->printFooter(); >> # >> >> >> >> >> >> >> >> >> > name="ensembl" path="/biomart/martservice" port="80" >> serverVirtualSchema="default" visible="1" /> >> > martUser="" name="snp" path="/biomart/martservice" port="80" >> serverVirtualSchema="default" visible="1" /> >> > displayName="ENSEMBL 56 FUNCTIONAL GENOMICS (SANGER UK)" >> host="www.biomart.org" includeDatasets="" martUser="" >> name="functional_genomics" path="/biomart/martservice" port="80" >> serverVirtualSchema="default" visible="1" /> >> > name="vega" path="/biomart/martservice" port="80" >> serverVirtualSchema
Re: [mart-dev] Perl API - 500 read timeout
Hello. 500 timeouts are usually caused by big queries that run for a long time. The reason is that there can be a delay during processing these queries that exceeds the maximum allowable response time for the web server you are communicating with via MartURLLocation. There are two solutions: a) break your query down into smaller pieces (e.g. each with a smaller set of protein IDs in your ensembl_peptide_id filter) then programatically recombine the results, b) install a local mirror of the BioMart databases that you need so that you can configure your registry to use direct database connections instead of using MartURLLocations. Direct database connections are faster and do not suffer from timeouts imposed by web servers. cheers, Richard On 20 Nov 2009, at 19:00, Chris Grassa wrote: > Hello, > > I have been having some trouble downloading sequence data via BioMart's Perl > API. I've been trying to obtain sets of coding sequences (maybe on the order > of 35MB or so each), but every time I execute the script, the following error > is returned: > > Problems with the web server: 500 read timeout > > I seem to be getting exactly 22 sequences every time, instead of the 20,000 > or so requested. I certainly would appreciate any help you may be able to > offer. I have included the code I am using below, which I mostly copied from > the Perl button on the Martview website. Below the perl, I have included the > XML contained in the registry file. Perhaps the data are available from a > host receiving less traffic? > > Regards and best wishes, > > Chris Grassa > > S. Tonia Hsieh Lab > University of Florida > > > > > #!/usr/bin/perl -w > > # An example script demonstrating the use of BioMart API. > # This perl API representation is only available for configuration versions > >= 0.5 > use strict; > use BioMart::Initializer; > use BioMart::Query; > use BioMart::QueryRunner; > > my $confFile = > "/home/grassa/src/biomart-perl/conf/ensembl_mart_56_registry.xml"; > # > # NB: change action to 'clean' if you wish to start a fresh configuration > # and to 'cached' if you want to skip configuration step on subsequent runs > from the same registry > # > > my $action='cached'; > my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, > 'action'=>$action); > my $registry = $initializer->getRegistry; > > my $query = > BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default'); > > > $query->setDataset("btaurus_gene_ensembl"); > $query->addFilter("ensembl_peptide_id", [large array of Ensembl protein > IDs]); > $query->addAttribute("ensembl_gene_id"); > $query->addAttribute("ensembl_transcript_id"); > $query->addAttribute("coding"); > $query->addAttribute("ensembl_peptide_id"); > > $query->formatter("FASTA"); > > my $query_runner = BioMart::QueryRunner->new(); > ## GET COUNT > # $query->count(1); > # $query_runner->execute($query); > # print $query_runner->getCount(); > # > > > ## GET RESULTS ## > # to obtain unique rows only > # $query_runner->uniqueRowsOnly(1); > > $query_runner->execute($query); > $query_runner->printHeader(); > $query_runner->printResults(); > $query_runner->printFooter(); > # > > > > > > > > > name="ensembl" path="/biomart/martservice" port="80" > serverVirtualSchema="default" visible="1" /> > name="snp" path="/biomart/martservice" port="80" > serverVirtualSchema="default" visible="1" /> > displayName="ENSEMBL 56 FUNCTIONAL GENOMICS (SANGER UK)" > host="www.biomart.org" includeDatasets="" martUser="" > name="functional_genomics" path="/biomart/martservice" port="80" > serverVirtualSchema="default" visible="1" /> > name="vega" path="/biomart/martservice" port="80" > serverVirtualSchema="default" visible="1" /> > displayName="ENSEMBL 56 GENOMIC FEATURES (SANGER UK)" host="www.biomart.org" > includeDatasets="" martUser="" name="genomic_features" > path="/biomart/martservice" port="80" serverVirtualSchema="default" > visible="0" /> > displayName="ENSEMBL 56 ONTOLOGY (SANGER UK)" host="www.biomart.org" > includeDatasets="" martUser="" name="ontology" path="/biomart/martservice" > port="80" serverVirtualSchema="default" visible="0" /> > displayName="ENSEMBL 56 SEQUENCE (SANGER UK)" host="www.biomart.org" > includeDatasets="" martUser="" name="sequence" path="/biomart/martservice" > port="80" serverVirtualSchema="default" visible="0" /> > > > > > > > -- -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holl...@eaglegenomics.com http://www.eaglegenomics.com/