Humbedooh commented on issue #489: Errors with Elasticsearch 5.x URL: https://github.com/apache/incubator-ponymail/issues/489#issuecomment-486490116 so, in elastic.py I'd propose the following: ~~~ diff diff --git a/tools/elastic.py b/tools/elastic.py index 897cb64..8152922 100755 --- a/tools/elastic.py +++ b/tools/elastic.py @@ -109,14 +109,27 @@ class Elastic: ) def scan(self, doc_type='mbox', scroll='3m', size = 100, **kwargs): - return self.es.search( + """ Run a backwards compatible scan/scroll, passing an iterator + that returns one page of hits per iteration. This + incorporates es.scoll for continuous iteration, and thus the + scroll() does NOT need to be called at all by the calling + process. """ + results = self.es.search( index=self.dbname, doc_type=doc_type, - search_type = 'scan', size = size, scroll = scroll, **kwargs ) + sid = results['_scroll_id'] + scroll_size = results['hits']['total'] + if results['hits'].get('hits', []): + yield results + while (scroll_size > 0): + results = self.scroll(scroll_id = sid, scroll = scroll) + sid = results['_scroll_id'] + scroll_size = len(results['hits']['hits']) + yield results def get(self, **kwargs): return self.es.get(index=self.dbname, **kwargs) ~~~ And then in edit-lists.py, we change it to: ~~~ python query = ... for page in es.scan(body = query): proposed_changes = process_hits(page, args, dbname) # Split into separate function ~~~
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
