Re: Python Elasticsearch query not returning the expected results when running subsequent calls

G Kerekes Mon, 13 Jan 2014 07:04:26 -0800

I have been trying to, but it's difficult. I tried to run it on 4 different 
machines in the office, and it all works fine on 2, but it doesn't on mine 
and 1 more. I looked at the http calls my machine is making while running 
the code and it seems that there are no calls going out to ES. Not sure 
what causes it, but seem like a python/machine issue rather than ES (when I 
query ES from the Sense plugin in chrome I always get the correct hits).
But thanks for trying anyway.


Geza

On Monday, January 13, 2014 2:35:57 PM UTC, Honza Král wrote:
>
> I can't replicate your problem, for me it all works. Could you please 
> isolate a working example that reproduces your behavior? Thanks 
>
> from elasticsearch import Elasticsearch 
> es = Elasticsearch() 
> es.index(index='i', doc_type='t', id=42, body={'hello': 'world'}) 
> es.index(index='i', doc_type='t', id=47, body={'hello': 'universe'}) 
> es.indices.refresh() 
> es.search(index='i', doc_type='t', body={"query": {"match_all": {}}, 
> "size": 0}) 
> es.search(index='i', doc_type='t', body={"query": {"match_all": {}}, 
> "size": 1}) 
>
> works just fine for me 
>
> On Mon, Jan 13, 2014 at 3:23 PM, G Kerekes <[email protected]<javascript:>> 
> wrote: 
> > Sorry Honza, I tried to somewhat anonimize my code but I was not 
> consistent. 
> > Basically I always query the same index, and my filter terms are also 
> > consistent (original = original_amit2 and tns_survey_data = 
> survey_data). 
> > 
> > 
> > 2014/1/13 G Kerekes <[email protected] <javascript:>> 
> >> 
> >> Just noticed some typos in my code, please see the fixed one below (the 
> >> queried index and filter terms were not consistent) 
> >> 
> >> 
> >> On Monday, January 13, 2014 2:08:25 PM UTC, G Kerekes wrote: 
> >>> 
> >>> Hi Honza, 
> >>> 
> >>> This is my "full" code: 
> >>> 
> >>> from elasticsearch import Elasticsearch 
> >>> import json 
> >>> import pandas as pd 
> >>> import numpy as np 
> >>> import os 
> >>> 
> >>> 
> >>> 
> >>> ### create the connection to the ES 
> >>> es = Elasticsearch("host:port", timeout=600, max_retries=10, 
> >>> revival_delay=0) 
> >>> 
> >>> 
> >>> ############################################################ 
> >>> ####### READ IN THE ORIGINAL SURVEY DATA ################### 
> >>> ############################## 
> >>> ############################## 
> >>> 
> >>> origall = es.search('survey_data' ,'primary', 
> >>>                    body = {"query": 
> >>>                         {"bool": 
> >>>                             {"must": 
> >>>                                 [{ 
> >>>                                     "term": {"file": "original"} 
> >>>                                 }] 
> >>>                                 } 
> >>>                         } 
> >>>                         ,"size" : "0"} 
> >>>                     ) 
> >>> 
> >>> total_o = origall['hits']['total'] 
> >>> 
> >>> origall_o = es.search('survey_data','primary', 
> >>>                    body = {"query": 
> >>>                         {"bool": 
> >>>                             {"must": 
> >>>                                 [{ 
> >>>                                     "term": {"file": "original"} 
> >>>                                 }] 
> >>>                                 } 
> >>>                         } 
> >>>                         ,"size" : 20 
> >>> 
> >>>                     } 
> >>> ) 
> >>> 
> >>> 
> >>> ## force it to data frame 
> >>> orig_dict = origall_o['hits']['hits'] 
> >>> 
> >>> 
> >>> ############################################################ 
> >>> ####### READ IN THE NEW SURVEY DATA ######################## 
> >>> ############################################################ 
> >>> 
> >>> 
> >>> ### get the documents 
> >>> newall = es.search('survey_data','primary', 
> >>>                        {"query": 
> >>>                            { 
> >>>                           "bool": 
> >>>                               { 
> >>>                              "should":[ 
> >>>                                 { 
> >>>                                    "term":{ 
> >>>                                       "file":"destinationqc22" 
> >>>                                    } 
> >>>                                 }, 
> >>>                                 { 
> >>>                                    "term":{ 
> >>>                                       "file":"destinationqc33" 
> >>>                                    } 
> >>>                                 }, 
> >>>                                 { 
> >>>                                    "term":{ 
> >>>                                       "file":"destinationqc44" 
> >>>                                    } 
> >>>                                 } 
> >>>                              ] 
> >>>                           } 
> >>>                        } 
> >>>                         ,"size" : "0" 
> >>>                     } 
> >>>  ) 
> >>> 
> >>> total_n = newall['hits']['total'] 
> >>> 
> >>> newall_n = es.search('survey_data','primary', 
> >>>                        {"query": 
> >>>                            { 
> >>>                           "bool": 
> >>>                               { 
> >>>                              "should":[ 
> >>>                                 { 
> >>>                                    "term":{ 
> >>>                                       "file":"destinationqc22" 
> >>>                                    } 
> >>>                                 }, 
> >>>                                 { 
> >>>                                    "term":{ 
> >>>                                       "file":"destinationqc33" 
> >>>                                    } 
> >>>                                 }, 
> >>>                                 { 
> >>>                                    "term":{ 
> >>>                                       "file":"destinationqc44" 
> >>>                                    } 
> >>>                                 } 
> >>>                              ] 
> >>>                           } 
> >>>                        } 
> >>>                         ,"size" : 20 
> >>>                     } 
> >>>  ) 
> >>> 
> >>> 
> >>> ## force it to data frame 
> >>> new_dict = newall_n['hits']['hits'] 
> >>> 
> >>> ## 
> >>> 
> >>> print(origall_o) 
> >>> print(newall_n) 
> >>> 
> >>> print orig_dict 
> >>> 
> >>> print new_dict 
> >>> 
> >>> And then I run it I get this: 
> >>> 
> >>> >>> print(origall_o) 
> >>> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, 
> >>> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 
> 15, 
> >>> u'timed_out': False} 
> >>> >>> print(newall_n) 
> >>> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, 
> >>> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 
> 15, 
> >>> u'timed_out': False} 
> >>> >>> 
> >>> >>> print orig_dict 
> >>> [] 
> >>> >>> 
> >>> >>> print new_dict 
> >>> [] 
> >>> >>> 
> >>> 
> >>> 
> >>> And what I would expect is: 
> >>> origall_o total is correct (110k hits) 
> >>> newall_n total should be 84k, not sure why it has the same 110k as for 
> >>> the origall_o 
> >>> 
> >>> And for the orig_dict and new_dict I would expect to see those 20 
> >>> documents that I query. 
> >>> 
> >>> Many thanks for your help. 
> >>> 
> >>> 
> >>> Geza 
> >>> 
> >>> 
> >>> 
> >>> On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote: 
> >>>> 
> >>>> Hi Geza, 
> >>>> 
> >>>> I don't understand what you mean by re-running, can you post the 
> >>>> complete code? 
> >>>> 
> >>>> When you do a search with size: 20, can you just print the result of 
> >>>> the search method and see if that data is there? 
> >>>> 
> >>>> As a side note it looks like you are trying to filter out some data, 
> >>>> while this works with a query you will get much better performance 
> >>>> when using a filtered query and a filter instead of a query. 
> >>>> 
> >>>> Honza 
> >>>> 
> >>>> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]> 
> wrote: 
> >>>> > Hello, 
> >>>> > 
> >>>> > I am querying an elasticsearch index from python. Issue 1 is that 
> when 
> >>>> > I 
> >>>> > change my query and rerun it, my objects in Python don't get 
> refreshed 
> >>>> > according to my modified query. Issue 2 is that even if I see that 
> I 
> >>>> > got 
> >>>> > some hits, no data comes through at all (eg I see I've got 85k 
> hits, 
> >>>> > but 
> >>>> > when I put it in a dictionary, it is blank). 
> >>>> > 
> >>>> > from elasticsearch import Elasticsearch 
> >>>> > 
> >>>> > es = Elasticsearch("host:port", timeout=600, max_retries=10, 
> >>>> > revival_delay=0) 
> >>>> > 
> >>>> > 
> >>>> > origall = es.search('esdata' ,'primary', 
> >>>> >                 {"query": 
> >>>> >                     {"bool": 
> >>>> >                         {"must_not": 
> >>>> >                             [{ 
> >>>> >                                 "term": {"file": "original"} 
> >>>> >                             }] 
> >>>> >                             } 
> >>>> >                     } 
> >>>> >                     ,"size" : "0"} 
> >>>> >                 ) 
> >>>> > 
> >>>> > total_o = origall['hits']['total'] 
> >>>> > 
> >>>> > At this stage for total_o I get 110k, which is correct. Then I 
> rerun 
> >>>> > my 
> >>>> > query after changing the size=0 to size=20, and if I want to have a 
> >>>> > look at 
> >>>> > these 20 hits, I get nothing for this: 
> >>>> > 
> >>>> > orig = origall['hits']['hits'] 
> >>>> > print(orig) 
> >>>> > 
> >>>> > Then I go back to my original query and change the must_not to 
> must. 
> >>>> > In this 
> >>>> > way I should get 85k hits, but after rerunning it I still get 110k 
> in 
> >>>> > total_o. 
> >>>> > 
> >>>> > It is quite random when it works and when it doesn't. Sometimes I 
> get 
> >>>> > my 
> >>>> > expected 85k hits, but then this get stuck and when I change my 
> query 
> >>>> > back 
> >>>> > to get the 110k, it would still be 85k. Also sometimes I get data 
> in 
> >>>> > my orig 
> >>>> > = origall['hits']['hits'], but then let's say I change the size in 
> my 
> >>>> > query 
> >>>> > to 0, rerun it and the origall['hits']['hits'] will still give me 
> back 
> >>>> > the 
> >>>> > data. 
> >>>> > 
> >>>> > I use Anaconda, but tried also in Pycharm and the default Python 
> IDLE, 
> >>>> > these 
> >>>> > behave the same. Tried to create separate ES connections for all my 
> >>>> > queries, 
> >>>> > doesn't help. Played around with cache, but no luck. 
> >>>> > 
> >>>> > I'm running it on a 64 bit, Windows 7 machine. 
> >>>> > 
> >>>> > Any idea what I'm doing wrong? Many thanks, 
> >>>> > 
> >>>> > Geza 
> >>>> > 
> >>>> > -- 
> >>>> > You received this message because you are subscribed to the Google 
> >>>> > Groups 
> >>>> > "elasticsearch" group. 
> >>>> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >>>> > an 
> >>>> > email to [email protected]. 
> >>>> > To view this discussion on the web visit 
> >>>> > 
> >>>> > 
> https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
>  
>
> >>>> > For more options, visit https://groups.google.com/groups/opt_out. 
> >> 
> >> -- 
> >> You received this message because you are subscribed to a topic in the 
> >> Google Groups "elasticsearch" group. 
> >> To unsubscribe from this topic, visit 
> >> https://groups.google.com/d/topic/elasticsearch/Ld5XwSVP6ik/unsubscribe. 
>
> >> To unsubscribe from this group and all its topics, send an email to 
> >> [email protected] <javascript:>. 
> >> To view this discussion on the web visit 
> >> 
> https://groups.google.com/d/msgid/elasticsearch/2a1eed86-eb4f-4459-93d1-a45ed499cc8a%40googlegroups.com.
>  
>
> >> 
> >> For more options, visit https://groups.google.com/groups/opt_out. 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to [email protected] <javascript:>. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/CAEJuwWXhtXEPxVTPuR4x4HHV0ZO3bMsSxMeK7ZfvNHSWBSkyGw%40mail.gmail.com.
>  
>
> > 
> > For more options, visit https://groups.google.com/groups/opt_out. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7525aba-63a0-4338-b39c-8e0ef6f463dc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Python Elasticsearch query not returning the expected results when running subsequent calls

Reply via email to