On 10/03/2018 12:33 PM, paul sorenson wrote:

Mike,

Are there unique features of joblib that you need to use?


I was seduced by "Parallel". On reading the docs a little more diligently it seems well suited to parallel computation with heavy compute-bound stuff like scientific number crunching and disk caching results to prevent re-computing.

Scraping web pages is often a good candidate for asyncio based models.


I think I'm being seduced by io in the name. I do judge books by their cover so I think I'll read asyncio

Thanks Paul

Mike

cheers


On 03/08/2018 11:41 PM, Mike Dewhirst wrote:
https://media.readthedocs.org/pdf/joblib/latest/joblib.pdf

I'm trying to make the following code run in parallel on separate CPU cores but haven't had any success.

def make_links(self): for db in databases: link = create_useful_link(self, Link, db) if link: scrape_db(self, link, db) This is a web scraper which is working nicely in a leisurely sequential manner.  databases is a list of urls with gaps to be filled by create_useful_link() which makes a link record from the Link class. The self instance is a source of attributes for filling the url gaps. self is a chemical substance and the link record url field when clicked in a browser will bring up that external website with the chemical substance selected for researching by the viewer. If successful, we then fetch the external page and scrape a bunch of interesting data from it and turn that into substance notes. scrape_db() doesn't return anything but it does create up to nine other records.

         from joblib import Parallel, delayed

         class Substance( etc ..
             ...
             def make_links(self):
                 #Parallel(n_jobs=-2)(delayed(
                 #    scrape_db(self, create_useful_link(self, Link, db), db) 
for db in databases
                 #))
I'm getting a TypeError from Parallel delayed() - can't pickle generator objects

So my question is how to write the commented code properly? I suspect I haven't done enough comprehension.

Thanks for any help

Mike


_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug


_______________________________________________
melbourne-pug mailing list
melbourne-pug@python.org
https://mail.python.org/mailman/listinfo/melbourne-pug

Reply via email to