I can't replicate your problem, for me it all works. Could you please
isolate a working example that reproduces your behavior? Thanks

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.index(index='i', doc_type='t', id=42, body={'hello': 'world'})
es.index(index='i', doc_type='t', id=47, body={'hello': 'universe'})
es.indices.refresh()
es.search(index='i', doc_type='t', body={"query": {"match_all": {}}, "size": 0})
es.search(index='i', doc_type='t', body={"query": {"match_all": {}}, "size": 1})

works just fine for me

On Mon, Jan 13, 2014 at 3:23 PM, G Kerekes <[email protected]> wrote:
> Sorry Honza, I tried to somewhat anonimize my code but I was not consistent.
> Basically I always query the same index, and my filter terms are also
> consistent (original = original_amit2 and tns_survey_data = survey_data).
>
>
> 2014/1/13 G Kerekes <[email protected]>
>>
>> Just noticed some typos in my code, please see the fixed one below (the
>> queried index and filter terms were not consistent)
>>
>>
>> On Monday, January 13, 2014 2:08:25 PM UTC, G Kerekes wrote:
>>>
>>> Hi Honza,
>>>
>>> This is my "full" code:
>>>
>>> from elasticsearch import Elasticsearch
>>> import json
>>> import pandas as pd
>>> import numpy as np
>>> import os
>>>
>>>
>>>
>>> ### create the connection to the ES
>>> es = Elasticsearch("host:port", timeout=600, max_retries=10,
>>> revival_delay=0)
>>>
>>>
>>> ############################################################
>>> ####### READ IN THE ORIGINAL SURVEY DATA ###################
>>> ##############################
>>> ##############################
>>>
>>> origall = es.search('survey_data' ,'primary',
>>>                    body = {"query":
>>>                         {"bool":
>>>                             {"must":
>>>                                 [{
>>>                                     "term": {"file": "original"}
>>>                                 }]
>>>                                 }
>>>                         }
>>>                         ,"size" : "0"}
>>>                     )
>>>
>>> total_o = origall['hits']['total']
>>>
>>> origall_o = es.search('survey_data','primary',
>>>                    body = {"query":
>>>                         {"bool":
>>>                             {"must":
>>>                                 [{
>>>                                     "term": {"file": "original"}
>>>                                 }]
>>>                                 }
>>>                         }
>>>                         ,"size" : 20
>>>
>>>                     }
>>> )
>>>
>>>
>>> ## force it to data frame
>>> orig_dict = origall_o['hits']['hits']
>>>
>>>
>>> ############################################################
>>> ####### READ IN THE NEW SURVEY DATA ########################
>>> ############################################################
>>>
>>>
>>> ### get the documents
>>> newall = es.search('survey_data','primary',
>>>                        {"query":
>>>                            {
>>>                           "bool":
>>>                               {
>>>                              "should":[
>>>                                 {
>>>                                    "term":{
>>>                                       "file":"destinationqc22"
>>>                                    }
>>>                                 },
>>>                                 {
>>>                                    "term":{
>>>                                       "file":"destinationqc33"
>>>                                    }
>>>                                 },
>>>                                 {
>>>                                    "term":{
>>>                                       "file":"destinationqc44"
>>>                                    }
>>>                                 }
>>>                              ]
>>>                           }
>>>                        }
>>>                         ,"size" : "0"
>>>                     }
>>>  )
>>>
>>> total_n = newall['hits']['total']
>>>
>>> newall_n = es.search('survey_data','primary',
>>>                        {"query":
>>>                            {
>>>                           "bool":
>>>                               {
>>>                              "should":[
>>>                                 {
>>>                                    "term":{
>>>                                       "file":"destinationqc22"
>>>                                    }
>>>                                 },
>>>                                 {
>>>                                    "term":{
>>>                                       "file":"destinationqc33"
>>>                                    }
>>>                                 },
>>>                                 {
>>>                                    "term":{
>>>                                       "file":"destinationqc44"
>>>                                    }
>>>                                 }
>>>                              ]
>>>                           }
>>>                        }
>>>                         ,"size" : 20
>>>                     }
>>>  )
>>>
>>>
>>> ## force it to data frame
>>> new_dict = newall_n['hits']['hits']
>>>
>>> ##
>>>
>>> print(origall_o)
>>> print(newall_n)
>>>
>>> print orig_dict
>>>
>>> print new_dict
>>>
>>> And then I run it I get this:
>>>
>>> >>> print(origall_o)
>>> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
>>> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
>>> u'timed_out': False}
>>> >>> print(newall_n)
>>> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
>>> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
>>> u'timed_out': False}
>>> >>>
>>> >>> print orig_dict
>>> []
>>> >>>
>>> >>> print new_dict
>>> []
>>> >>>
>>>
>>>
>>> And what I would expect is:
>>> origall_o total is correct (110k hits)
>>> newall_n total should be 84k, not sure why it has the same 110k as for
>>> the origall_o
>>>
>>> And for the orig_dict and new_dict I would expect to see those 20
>>> documents that I query.
>>>
>>> Many thanks for your help.
>>>
>>>
>>> Geza
>>>
>>>
>>>
>>> On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote:
>>>>
>>>> Hi Geza,
>>>>
>>>> I don't understand what you mean by re-running, can you post the
>>>> complete code?
>>>>
>>>> When you do a search with size: 20, can you just print the result of
>>>> the search method and see if that data is there?
>>>>
>>>> As a side note it looks like you are trying to filter out some data,
>>>> while this works with a query you will get much better performance
>>>> when using a filtered query and a filter instead of a query.
>>>>
>>>> Honza
>>>>
>>>> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]> wrote:
>>>> > Hello,
>>>> >
>>>> > I am querying an elasticsearch index from python. Issue 1 is that when
>>>> > I
>>>> > change my query and rerun it, my objects in Python don't get refreshed
>>>> > according to my modified query. Issue 2 is that even if I see that I
>>>> > got
>>>> > some hits, no data comes through at all (eg I see I've got 85k hits,
>>>> > but
>>>> > when I put it in a dictionary, it is blank).
>>>> >
>>>> > from elasticsearch import Elasticsearch
>>>> >
>>>> > es = Elasticsearch("host:port", timeout=600, max_retries=10,
>>>> > revival_delay=0)
>>>> >
>>>> >
>>>> > origall = es.search('esdata' ,'primary',
>>>> >                 {"query":
>>>> >                     {"bool":
>>>> >                         {"must_not":
>>>> >                             [{
>>>> >                                 "term": {"file": "original"}
>>>> >                             }]
>>>> >                             }
>>>> >                     }
>>>> >                     ,"size" : "0"}
>>>> >                 )
>>>> >
>>>> > total_o = origall['hits']['total']
>>>> >
>>>> > At this stage for total_o I get 110k, which is correct. Then I rerun
>>>> > my
>>>> > query after changing the size=0 to size=20, and if I want to have a
>>>> > look at
>>>> > these 20 hits, I get nothing for this:
>>>> >
>>>> > orig = origall['hits']['hits']
>>>> > print(orig)
>>>> >
>>>> > Then I go back to my original query and change the must_not to must.
>>>> > In this
>>>> > way I should get 85k hits, but after rerunning it I still get 110k in
>>>> > total_o.
>>>> >
>>>> > It is quite random when it works and when it doesn't. Sometimes I get
>>>> > my
>>>> > expected 85k hits, but then this get stuck and when I change my query
>>>> > back
>>>> > to get the 110k, it would still be 85k. Also sometimes I get data in
>>>> > my orig
>>>> > = origall['hits']['hits'], but then let's say I change the size in my
>>>> > query
>>>> > to 0, rerun it and the origall['hits']['hits'] will still give me back
>>>> > the
>>>> > data.
>>>> >
>>>> > I use Anaconda, but tried also in Pycharm and the default Python IDLE,
>>>> > these
>>>> > behave the same. Tried to create separate ES connections for all my
>>>> > queries,
>>>> > doesn't help. Played around with cache, but no luck.
>>>> >
>>>> > I'm running it on a 64 bit, Windows 7 machine.
>>>> >
>>>> > Any idea what I'm doing wrong? Many thanks,
>>>> >
>>>> > Geza
>>>> >
>>>> > --
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups
>>>> > "elasticsearch" group.
>>>> > To unsubscribe from this group and stop receiving emails from it, send
>>>> > an
>>>> > email to [email protected].
>>>> > To view this discussion on the web visit
>>>> >
>>>> > https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
>>>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/Ld5XwSVP6ik/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/2a1eed86-eb4f-4459-93d1-a45ed499cc8a%40googlegroups.com.
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEJuwWXhtXEPxVTPuR4x4HHV0ZO3bMsSxMeK7ZfvNHSWBSkyGw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDir-viyzcCM28PXEX0ki5S%2B3P6rDYo9gShn7UJPLKXvbaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to