Re: [mart-dev] Perl API - 500 read timeout

2009-11-20 Thread Maximilian Haussler
isn't it easier to download the coding sequences from the ensembl ftp server
and then post-process them on your own machine? but you might have tried
that already.
cheers
Max


Re: [mart-dev] Perl API - 500 read timeout

2009-11-20 Thread Richard Holland
Sorry, forgot there is also a third option...

You can contact the owners of the MartURLLocation websites you are pointing at 
to see if they will give you a copy of the registry file that they use 
internally. You can then use the MartDBLocation entries from that registry file 
to put into your own registry, so that your local code can directly access the 
databases and bypass the webservers, thus avoiding the timeout situation.

This only works for those providers who allow direct external access to their 
database servers (I know that Ensembl do, but I'm not sure about others - you 
can connect to the Ensembl ones, which are MySQL databases, using username 
'anonymous', no password, on port 5316 of the server martdb.ensembl.org). 

cheers,
Richard

On 20 Nov 2009, at 20:11, Richard Holland wrote:

> Hello. 
> 
> 500 timeouts are usually caused by big queries that run for a long time. The 
> reason is that there can be a delay during processing these queries that 
> exceeds the maximum allowable response time for the web server you are 
> communicating with via MartURLLocation. 
> 
> There are two solutions:
> 
> a) break your query down into smaller pieces (e.g. each with a smaller set of 
> protein IDs in your ensembl_peptide_id filter) then programatically recombine 
> the results,
> 
> b) install a local mirror of the BioMart databases that you need so that you 
> can configure your registry to use direct database connections instead of 
> using MartURLLocations. Direct database connections are faster and do not 
> suffer from timeouts imposed by web servers.
> 
> cheers,
> Richard
> 
> On 20 Nov 2009, at 19:00, Chris Grassa wrote:
> 
>> Hello,
>> 
>> I have been having some trouble downloading sequence data via BioMart's Perl 
>> API.  I've been trying to obtain sets of coding sequences (maybe on the 
>> order of 35MB or so each), but every time I execute the script, the 
>> following error is returned:
>> 
>> Problems with the web server: 500 read timeout
>> 
>> I seem to be getting exactly 22 sequences every time, instead of the 20,000 
>> or so requested.  I certainly would appreciate any help you may be able to 
>> offer.  I have included the code I am using below, which I mostly copied 
>> from the Perl button on the Martview website.  Below the perl, I have 
>> included the XML contained in the registry file.  Perhaps the data are 
>> available from a host receiving less traffic?
>> 
>> Regards and best wishes,
>> 
>> Chris Grassa
>> 
>> S. Tonia Hsieh Lab
>> University of Florida
>> 
>> 
>> 
>> 
>> #!/usr/bin/perl -w
>> 
>> # An example script demonstrating the use of BioMart API.
>> # This perl API representation is only available for configuration versions 
>> >=  0.5
>> use strict;
>> use BioMart::Initializer;
>> use BioMart::Query;
>> use BioMart::QueryRunner;
>> 
>> my $confFile = 
>> "/home/grassa/src/biomart-perl/conf/ensembl_mart_56_registry.xml";
>> #
>> # NB: change action to 'clean' if you wish to start a fresh configuration
>> # and to 'cached' if you want to skip configuration step on subsequent runs 
>> from the same registry
>> #
>> 
>> my $action='cached';
>> my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 
>> 'action'=>$action);
>> my $registry = $initializer->getRegistry;
>> 
>> my $query = 
>> BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
>> 
>> 
>>  $query->setDataset("btaurus_gene_ensembl");
>>  $query->addFilter("ensembl_peptide_id", [large array of Ensembl protein 
>> IDs]);
>>  $query->addAttribute("ensembl_gene_id");
>>  $query->addAttribute("ensembl_transcript_id");
>>  $query->addAttribute("coding");
>>  $query->addAttribute("ensembl_peptide_id");
>> 
>> $query->formatter("FASTA");
>> 
>> my $query_runner = BioMart::QueryRunner->new();
>> ## GET COUNT 
>> # $query->count(1);
>> # $query_runner->execute($query);
>> # print $query_runner->getCount();
>> #
>> 
>> 
>> ## GET RESULTS ##
>> # to obtain unique rows only
>> # $query_runner->uniqueRowsOnly(1);
>> 
>> $query_runner->execute($query);
>> $query_runner->printHeader();
>> $query_runner->printResults();
>> $query_runner->printFooter();
>> #
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > name="ensembl" path="/biomart/martservice" port="80" 
>> serverVirtualSchema="default" visible="1" />
>> > martUser="" name="snp" path="/biomart/martservice" port="80" 
>> serverVirtualSchema="default" visible="1" />
>> > displayName="ENSEMBL 56 FUNCTIONAL GENOMICS (SANGER UK)" 
>> host="www.biomart.org" includeDatasets="" martUser="" 
>> name="functional_genomics" path="/biomart/martservice" port="80" 
>> serverVirtualSchema="default" visible="1" />
>> > name="vega" path="/biomart/martservice" port="80" 
>> serverVirtualSchema

Re: [mart-dev] Perl API - 500 read timeout

2009-11-20 Thread Richard Holland
Hello. 

500 timeouts are usually caused by big queries that run for a long time. The 
reason is that there can be a delay during processing these queries that 
exceeds the maximum allowable response time for the web server you are 
communicating with via MartURLLocation. 

There are two solutions:

a) break your query down into smaller pieces (e.g. each with a smaller set of 
protein IDs in your ensembl_peptide_id filter) then programatically recombine 
the results,

b) install a local mirror of the BioMart databases that you need so that you 
can configure your registry to use direct database connections instead of using 
MartURLLocations. Direct database connections are faster and do not suffer from 
timeouts imposed by web servers.

cheers,
Richard

On 20 Nov 2009, at 19:00, Chris Grassa wrote:

> Hello,
> 
> I have been having some trouble downloading sequence data via BioMart's Perl 
> API.  I've been trying to obtain sets of coding sequences (maybe on the order 
> of 35MB or so each), but every time I execute the script, the following error 
> is returned:
> 
> Problems with the web server: 500 read timeout
> 
> I seem to be getting exactly 22 sequences every time, instead of the 20,000 
> or so requested.  I certainly would appreciate any help you may be able to 
> offer.  I have included the code I am using below, which I mostly copied from 
> the Perl button on the Martview website.  Below the perl, I have included the 
> XML contained in the registry file.  Perhaps the data are available from a 
> host receiving less traffic?
> 
> Regards and best wishes,
> 
> Chris Grassa
> 
> S. Tonia Hsieh Lab
> University of Florida
> 
> 
> 
> 
> #!/usr/bin/perl -w
> 
> # An example script demonstrating the use of BioMart API.
> # This perl API representation is only available for configuration versions 
> >=  0.5
> use strict;
> use BioMart::Initializer;
> use BioMart::Query;
> use BioMart::QueryRunner;
> 
> my $confFile = 
> "/home/grassa/src/biomart-perl/conf/ensembl_mart_56_registry.xml";
> #
> # NB: change action to 'clean' if you wish to start a fresh configuration
> # and to 'cached' if you want to skip configuration step on subsequent runs 
> from the same registry
> #
> 
> my $action='cached';
> my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 
> 'action'=>$action);
> my $registry = $initializer->getRegistry;
> 
> my $query = 
> BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
> 
> 
>   $query->setDataset("btaurus_gene_ensembl");
>   $query->addFilter("ensembl_peptide_id", [large array of Ensembl protein 
> IDs]);
>   $query->addAttribute("ensembl_gene_id");
>   $query->addAttribute("ensembl_transcript_id");
>   $query->addAttribute("coding");
>   $query->addAttribute("ensembl_peptide_id");
> 
> $query->formatter("FASTA");
> 
> my $query_runner = BioMart::QueryRunner->new();
> ## GET COUNT 
> # $query->count(1);
> # $query_runner->execute($query);
> # print $query_runner->getCount();
> #
> 
> 
> ## GET RESULTS ##
> # to obtain unique rows only
> # $query_runner->uniqueRowsOnly(1);
> 
> $query_runner->execute($query);
> $query_runner->printHeader();
> $query_runner->printResults();
> $query_runner->printFooter();
> #
> 
> 
> 
> 
> 
> 
> 
> 
>   name="ensembl" path="/biomart/martservice" port="80" 
> serverVirtualSchema="default" visible="1" />
>   name="snp" path="/biomart/martservice" port="80" 
> serverVirtualSchema="default" visible="1" />
>   displayName="ENSEMBL 56 FUNCTIONAL GENOMICS (SANGER UK)" 
> host="www.biomart.org" includeDatasets="" martUser="" 
> name="functional_genomics" path="/biomart/martservice" port="80" 
> serverVirtualSchema="default" visible="1" />
>   name="vega" path="/biomart/martservice" port="80" 
> serverVirtualSchema="default" visible="1" />
>   displayName="ENSEMBL 56 GENOMIC FEATURES (SANGER UK)" host="www.biomart.org" 
> includeDatasets="" martUser="" name="genomic_features" 
> path="/biomart/martservice" port="80" serverVirtualSchema="default" 
> visible="0" />
>   displayName="ENSEMBL 56 ONTOLOGY (SANGER UK)" host="www.biomart.org" 
> includeDatasets="" martUser="" name="ontology" path="/biomart/martservice" 
> port="80" serverVirtualSchema="default" visible="0" />
>   displayName="ENSEMBL 56 SEQUENCE (SANGER UK)" host="www.biomart.org" 
> includeDatasets="" martUser="" name="sequence" path="/biomart/martservice" 
> port="80" serverVirtualSchema="default" visible="0" />
> 
> 
> 
> 
> 
> 
> --

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holl...@eaglegenomics.com
http://www.eaglegenomics.com/