Also it looks like if you drop down the number to index for each run on a large site the indexing goes faster as the bulk of the slowness looks to be computing information into memory before passing it to the index.
eg 420,000 records at 10,000 records per run took 90mins 420,000 records at 5,000 records per run took 45mins -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1732565 Title: Allow faster indexing of elasticsearch via cli script Status in Mahara: In Progress Bug description: When one re-indexes a large site it can take hours before the site is fully re-indexed. This is because even though we index via the bulk system it is restricted by number of records we can read into memory and speed of cron run. A way we could speed this up is via a fast index CLI script that allows us to fire off the next cron run for elasticsearch indexing immediately after previous one finishes This way we would save the 'dead time' between runs waiting for the server clock to tick over to the next minute To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1732565/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~mahara-contributors Post to : [email protected] Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp

