Re: [Boston.pm] Passing large complex data structures between process

Morse, Richard E.MGH Thu, 04 Apr 2013 08:49:22 -0700

On Apr 3, 2013, at 10:34 AM, David Larochelle <[email protected]> wrote:


> Currently, the driver process periodically  queries a database to get a
> list of URLs to crawler. It then stores these url's to be downloaded in a
> complex in memory and pipes them to separate processes that do the actual
> downloading. The problem is that the database queries are slow and block
> the driver process.

Hi! The first thing I would do is to profile the database queries. If you have 
very slow database queries, it is likely there are steps that could speed them 
up. Possibly an index could help. Maybe do something with a realized view? Or 
possibly look at jiggering the database definition? How does the data get in -- 
is there a simple step you could take when adding data to the database that 
would put the URLs in some other table somewhere so that your query can change 
from a complex multi-join query to a simpler "SELECT * FROM table"?

Ricky


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.


_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Passing large complex data structures between process

Reply via email to