Mike, Are there unique features of joblib that you need to use?
Scraping web pages is often a good candidate for asyncio based models. cheers On 03/08/2018 11:41 PM, Mike Dewhirst wrote: > https://media.readthedocs.org/pdf/joblib/latest/joblib.pdf > > I'm trying to make the following code run in parallel on separate CPU > cores but haven't had any success. > > def make_links(self): for db in databases: link = > create_useful_link(self, Link, db) if link: scrape_db(self, link, db) > This is a web scraper which is working nicely in a leisurely > sequential manner. databases is a list of urls with gaps to be filled > by create_useful_link() which makes a link record from the Link class. > The self instance is a source of attributes for filling the url gaps. > self is a chemical substance and the link record url field when > clicked in a browser will bring up that external website with the > chemical substance selected for researching by the viewer. If > successful, we then fetch the external page and scrape a bunch of > interesting data from it and turn that into substance notes. > scrape_db() doesn't return anything but it does create up to nine > other records. > > from joblib import Parallel, delayed > > class Substance( etc .. > ... > def make_links(self): > #Parallel(n_jobs=-2)(delayed( > # scrape_db(self, create_useful_link(self, Link, db), db) > for db in databases > #)) > I'm getting a TypeError from Parallel delayed() - can't pickle > generator objects > > So my question is how to write the commented code properly? I suspect > I haven't done enough comprehension. > > Thanks for any help > > Mike > > > _______________________________________________ > melbourne-pug mailing list > melbourne-pug@python.org > https://mail.python.org/mailman/listinfo/melbourne-pug
_______________________________________________ melbourne-pug mailing list melbourne-pug@python.org https://mail.python.org/mailman/listinfo/melbourne-pug