Hi Elfar,

the best is to download them using web browser's Export (email option). This will compile the results on server side and then send you a link in email.

Best,
Syed


Elfar Torarinsson wrote:
Hi,

I was trying to automate regular downloads of human CDS (and UTRs)
using biomart. I have tried it using the perl script generated at
biomart:

use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = 
"/home/projects/ensembl/biomart-perl/conf/apiExampleRegistry.xml";
my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile,
'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = 
BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

$query->setDataset("hsapiens_gene_ensembl");
$query->addAttribute("ensembl_gene_id");
$query->addAttribute("ensembl_transcript_id");
$query->addAttribute("coding");
$query->addAttribute("external_gene_id");

$query->formatter("FASTA");

my $query_runner = BioMart::QueryRunner->new();
# to obtain unique rows only
$query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();

This only retrieves a few sequences and then starts returning
"Problems with the web server: 500 read timeout"

I have also tried posting the XML using LWP in perl, this downloads
more sequences but this also stops after a while before downloading
all the sequences:

use strict;
use LWP::UserAgent;
open (FH,$ARGV[0]) || die ("\nUsage: perl postXML.pl Query.xml\n\n");
my $xml;
while (<FH>){
    $xml .= $_;
}
close(FH);

my $path="http://www.biomart.org/biomart/martservice?";;
my $request = 
HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
my $ua = LWP::UserAgent->new;
$ua->timeout(30000000);
my $response;

$ua->request($request,
             sub{
                 my($data, $response) = @_;
                 if ($response->is_success) {
                     print "$data";
                 }
                 else {
                     warn ("Problems with the web server:
".$response->status_line);
                 }
             },500);

I have managed to download all the sequences using the browser before,
but, it required several tries and I had to get them gzipped (also so
I could be sure I got all of them when gunzipping them).

So, my question is, is there anything I can do to be able to download
all the sequences? I.e. avoid timeouts, some easy, systematic, way to
split my calls into much smaller calls or something else?

Thanks,

Elfar

Reply via email to