It takes four hours to run the query?  I have written queries affecting 
grouping and sorting 6 billion rows that ran several orders of magnitude faster 
than that.

I suspect that you need to consider appropriate indexes to support your where 
clauses?

If you truly working with data of that billion-record magnitude, your heap is 
too big to query this way.  You'll need to take an alternative approach such as 
denormalizing the data store into derived data closer to the final form that 
will be needed, then maintain the denormalized data to be in sync.  (That 
denormalization could take the form of creating a smaller summary table, which 
could be queried with DBIx::Class - I did that for statistics reporting before. 
 Though I could query the stats directly, it was very expensive.)

David

-----Original Message-----
From: James Marca [mailto:[email protected]] 
Sent: Wednesday, February 03, 2010 10:16 AM
To: DBIx::Class user and developer list
Subject: Re: [Dbix-class] paging question

On Tue, Feb 02, 2010 at 11:14:27AM -0800, James Marca wrote:
> Hi.  
> 
> First DBIx::Class is excellent and kudos all around.
> 
...

> My query is rather large and expensive.  I'm using paging not to drive
> a UI, but because when running the request without paging psql
> complained that it ran out of disk space for the tmp file it needed
> for the cursor (I forget the exact wording of the error).  My page
> size is currently set at 4 million rows, which seems fine.
> 
> My guess is that under the hood paging is preparing the query, then
> executing that prepared statement repeatedly with different offsets.
> However, in practice it *appears* (I can't prove this yet so I could
> be wrong) that the page(2), page(3), etc queries take just as long to
> execute as the first page(1) query, as if they aren't running off of
> the same prepared statement.
> 
> And because the queries take so long, I really don't want to run that
> last page(n+1) query, wait an age, and then  get nothing.  

More details.  I stuck in a line to print the timestamp at the
beginning and end of the query.  The queries for page 1 and page 2
both took around 4 hours to complete.  While it could be that I'm just
out of luck here, I'd appreciate some advice from anybody who knows
the code well as to whether or not manually setting up a prepared
statement is worth it.

>From what I read in the code and from reading the docs, I see that
DBIx::Class caches statements, but I'm not sure whether or not that is
the same thing as reusing a prepared statement with different
parameters.

Regards,
James

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/[email protected]

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/[email protected]

Reply via email to