Hi Syed,

Two weeks ago, I asked a question about joining two large datasets that
took a long time to execute.  
Your suggestion was to switch the order of the query.  It did improve
the performance.

However, one of our use cases requires to put a filter on the first
dataset. 
It seems that if there is filter in the first dataset, execution time is
long.

I am not concerning the performance at this point.  I can use the "email
notification" option to wait for the results.

But it will eventually give me exception error. From the log, the
program printed out "Got no results" after executing sql for the first
dataset.
Can I terminate the iteration at that point?

Thanks,
Denny 



-----Original Message-----
From: Syed Haider [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 29, 2008 8:50 AM
To: Chan, Denny (NIH/NCI) [C]
Cc: [email protected]
Subject: Re: [mart-dev] Joining two datasets

Hi Denny,

can you execute the same query in reverse order - by swapping the
dataset order ? what happens when you do this ?

syed




On Thu, 2008-05-29 at 08:02 -0400, Chan, Denny (NIH/NCI) [C] wrote:
> When joining two datasets, the BioMart ran a batch iteration for both
> datasets. But when it reached to the end of first dataset, it still
> tried to query the second dataset with a invalid SQL statement.  Here
is
> what in the log file
> 
> 
>
========================================================================
> ====================================================
> BioMart.Dataset.TableSet:735:WARN> QUERY SQL:  SELECT main.seqid,
> main.charge FROM cpas2biomart.peptidesview__peptidesview
> __main main LIMIT 50000 OFFSET 9815643
> BioMart.DatasetI:1175:DEBUG> Got no results
> BioMart.DatasetI:1261:DEBUG> Attribute hash
> BioMart.DatasetI:1262:DEBUG> Before hash: 0
> BioMart.DatasetI:1269:DEBUG> After hash: 0
> BioMart.Dataset.TableSet:735:WARN> QUERY SQL:  SELECT main.seqid_key,
> main.bestname, main.length, main.mass, main.descript
> ion, main.seqid_key FROM
> cpas2biomart.protsequencesview__protsequences__main main WHERE
> (main.seqid_key = '96305') AND (ma
> in.seqid_key IN('')) LIMIT 400
> DBD::Pg::st execute failed: ERROR:  invalid input syntax for integer:
""
> BioMart.Web:2228:DEBUG> Serious error: Error during query execution:
> ERROR:  invalid input syntax for integer: ""
>
========================================================================
> ======================================================
> 
> 
> The first dataset "peptidesview__peptidesview"  has 9815643 records.
> The first SQL statement will return zero record, which leads to an
empty
> IN clause in the second SQL statement.
> 
> 
> Does anyone know the fix for this problem?
> 
> Thanks,
> Denny Chan
-- 
======================================
Syed Haider.
EMBL-European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton,
Cambridge CB10 1SD, UK.
======================================

Reply via email to