Humbedooh commented on issue #489: Errors with Elasticsearch 5.x
URL: 
https://github.com/apache/incubator-ponymail/issues/489#issuecomment-486490116
 
 
   so, in elastic.py I'd propose the following:
   
   ~~~ diff
   diff --git a/tools/elastic.py b/tools/elastic.py
   index 897cb64..8152922 100755
   --- a/tools/elastic.py
   +++ b/tools/elastic.py
   @@ -109,14 +109,27 @@ class Elastic:
            )
    
        def scan(self, doc_type='mbox', scroll='3m', size = 100, **kwargs):
   -        return self.es.search(
   +        """ Run a backwards compatible scan/scroll, passing an iterator
   +            that returns one page of hits per iteration. This
   +            incorporates es.scoll for continuous iteration, and thus the
   +            scroll() does NOT need to be called at all by the calling
   +            process. """
   +        results = self.es.search(
                index=self.dbname,
                doc_type=doc_type,
   -            search_type = 'scan',
                size = size,
                scroll = scroll,
                **kwargs
            )
   +        sid = results['_scroll_id']
   +        scroll_size = results['hits']['total']
   +        if results['hits'].get('hits', []):
   +            yield results
   +        while (scroll_size > 0):
   +            results = self.scroll(scroll_id = sid, scroll = scroll)
   +            sid = results['_scroll_id']
   +            scroll_size = len(results['hits']['hits'])
   +            yield results
    
        def get(self, **kwargs):
            return self.es.get(index=self.dbname, **kwargs)
   ~~~
   
   And then in edit-lists.py, we change it to:
   
   ~~~ python
   query = ...
   for page in es.scan(body = query):
               proposed_changes = process_hits(page, args, dbname) # Split into 
separate function
   ~~~

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to