Just curious, are we re-indexing the whole thing each time, or does it take 40 minutes to update the index for 3 hours' worth of changes?
*Dan Poirier*Developer [email protected] www.caktusgroup.com On Thu, Sep 10, 2015 at 9:31 AM, Donald Stufft <[email protected]> wrote: > On September 10, 2015 at 8:48:05 AM, David Wilson ( > [email protected]) wrote: > > On Thu, Sep 10, 2015 at 03:07:14PM +0300, Ionel Cristian Mărieș wrote: > > > > > Wouldn't it be better if you'd just build an external search service? > > > Getting a list of packages and descriptions should be possible no? > > > (just asking, not 100% sure) > > > > That would be the idea. In fact preferably not build a service at all, > > just pay someone $50/mo for hosted ElasticSearch, rip out the guts of > > the old thing and write a small sync cron job similar to the one > > existing in the Bitbucket repo I linked. > > > > > > The old PostgreSQL based system has been gone for awhile, and we already > have ElasticSearch with a small cron job that runs every 3 hours to index > the data. > > When we moved the database to Heroku this cronjob started taking 6+ hours > to > complete, because we were fetching data in too small of chunks which didn't > actually hurt when the script and the database were running close to each > other. That got "fixed" a day or two ago by increasing the size of the > chunks > we pulled from 1000 to 10000 and by switching to a > SERIALIZABLE READ ONLY DEFERRABLE transaction so that we only needed to > hold > open a lock right at the very beginning which has the job finishing in 40 > minutes now. I suspect further enhancements to the indexing speed will > require > locating the script in EC2 to get it closer to the PostgreSQL instance. > > Given that these problems seem to be *new* since the move of the database > to > Heroku, I don't think the shape of our data in Elasticsearch nor the actual > query we're using which hasn't changed should be at fault, so I've been > trying > to figure out what else we might have changed in the transition that would > have > caused it. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > DCFA > > > _______________________________________________ > Distutils-SIG maillist - [email protected] > https://mail.python.org/mailman/listinfo/distutils-sig >
_______________________________________________ Distutils-SIG maillist - [email protected] https://mail.python.org/mailman/listinfo/distutils-sig
