On Apr 3, 2013, at 10:34 AM, David Larochelle <[email protected]> wrote:
> Currently, the driver process periodically queries a database to get a > list of URLs to crawler. It then stores these url's to be downloaded in a > complex in memory and pipes them to separate processes that do the actual > downloading. The problem is that the database queries are slow and block > the driver process. Hi! The first thing I would do is to profile the database queries. If you have very slow database queries, it is likely there are steps that could speed them up. Possibly an index could help. Maybe do something with a realized view? Or possibly look at jiggering the database definition? How does the data get in -- is there a simple step you could take when adding data to the database that would put the URLs in some other table somewhere so that your query can change from a complex multi-join query to a simpler "SELECT * FROM table"? Ricky The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

