Re: Python Elasticsearch query not returning the expected results when running subsequent calls

G Kerekes Mon, 13 Jan 2014 06:08:55 -0800

Hi Honza,

This is my "full" code:


from elasticsearch import Elasticsearch
import json
import pandas as pd
import numpy as np
import os



### create the connection to the ES
es = Elasticsearch("host:port", timeout=600, max_retries=10, revival_delay=0)


############################################################
####### READ IN THE ORIGINAL SURVEY DATA ###################
############################################################

origall = es.search('survey_data' ,'primary',
                   body = {"query": 
                        {"bool": 
                            {"must": 
                                [{
                                    "term": {"file": "original"}
                                }]
                                }
                        }
                        ,"size" : "0"}
                    )

total_o = origall['hits']['total']

origall_o = es.search('tns_survey_data','primary',
                   body = {"query": 
                        {"bool": 
                            {"must": 
                                [{
                                    "term": {"file": "original_amit2"}
                                }]
                                }
                        }
                        ,"size" : 20

                    }
)


## force it to data frame
orig_dict = origall_o['hits']['hits']


############################################################
####### READ IN THE NEW SURVEY DATA ########################
############################################################


### get the documents
newall = es.search('survey_data','primary',
                       {"query":
                           {
                          "bool":
                              {
                             "should":[
                                {
                                   "term":{
                                      "file":"destinationqc22"
                                   }
                                },            
                                {
                                   "term":{
                                      "file":"destinationqc33"
                                   }
                                },            
                                {
                                   "term":{
                                      "file":"destinationqc44"
                                   }
                                }
                             ]
                          }
                       }
                        ,"size" : "0"                       
                    }
 )

total_n = newall['hits']['total']

newall_n = es.search('tns_survey_data','primary',
                       {"query":
                           {
                          "bool":
                              {
                             "should":[
                                {
                                   "term":{
                                      "file":"destinationqc22"
                                   }
                                },            
                                {
                                   "term":{
                                      "file":"destinationqc33"
                                   }
                                },            
                                {
                                   "term":{
                                      "file":"destinationqc44"
                                   }
                                }
                             ]
                          }
                       }
                        ,"size" : 20                      
                    }
 )


## force it to data frame
new_dict = newall_n['hits']['hits']

##

print(origall_o)
print(newall_n)

print orig_dict

print new_dict

And then I run it I get this:

>>> print(origall_o)
{u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, 
u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, 
u'timed_out': False}
>>> print(newall_n)
{u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, 
u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, 
u'timed_out': False}
>>> 
>>> print orig_dict
[]
>>> 
>>> print new_dict
[]
>>> 


And what I would expect is:
origall_o total is correct (110k hits)
newall_n total should be 84k, not sure why it has the same 110k as for the 
origall_o

And for the orig_dict and new_dict I would expect to see those 20 documents 
that I query.

Many thanks for your help.


Geza



On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote:
>
> Hi Geza, 
>
> I don't understand what you mean by re-running, can you post the complete 
> code? 
>
> When you do a search with size: 20, can you just print the result of 
> the search method and see if that data is there? 
>
> As a side note it looks like you are trying to filter out some data, 
> while this works with a query you will get much better performance 
> when using a filtered query and a filter instead of a query. 
>
> Honza 
>
> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]<javascript:>> 
> wrote: 
> > Hello, 
> > 
> > I am querying an elasticsearch index from python. Issue 1 is that when I 
> > change my query and rerun it, my objects in Python don't get refreshed 
> > according to my modified query. Issue 2 is that even if I see that I got 
> > some hits, no data comes through at all (eg I see I've got 85k hits, but 
> > when I put it in a dictionary, it is blank). 
> > 
> > from elasticsearch import Elasticsearch 
> > 
> > es = Elasticsearch("host:port", timeout=600, max_retries=10, 
> > revival_delay=0) 
> > 
> > 
> > origall = es.search('esdata' ,'primary', 
> >                 {"query": 
> >                     {"bool": 
> >                         {"must_not": 
> >                             [{ 
> >                                 "term": {"file": "original"} 
> >                             }] 
> >                             } 
> >                     } 
> >                     ,"size" : "0"} 
> >                 ) 
> > 
> > total_o = origall['hits']['total'] 
> > 
> > At this stage for total_o I get 110k, which is correct. Then I rerun my 
> > query after changing the size=0 to size=20, and if I want to have a look 
> at 
> > these 20 hits, I get nothing for this: 
> > 
> > orig = origall['hits']['hits'] 
> > print(orig) 
> > 
> > Then I go back to my original query and change the must_not to must. In 
> this 
> > way I should get 85k hits, but after rerunning it I still get 110k in 
> > total_o. 
> > 
> > It is quite random when it works and when it doesn't. Sometimes I get my 
> > expected 85k hits, but then this get stuck and when I change my query 
> back 
> > to get the 110k, it would still be 85k. Also sometimes I get data in my 
> orig 
> > = origall['hits']['hits'], but then let's say I change the size in my 
> query 
> > to 0, rerun it and the origall['hits']['hits'] will still give me back 
> the 
> > data. 
> > 
> > I use Anaconda, but tried also in Pycharm and the default Python IDLE, 
> these 
> > behave the same. Tried to create separate ES connections for all my 
> queries, 
> > doesn't help. Played around with cache, but no luck. 
> > 
> > I'm running it on a 64 bit, Windows 7 machine. 
> > 
> > Any idea what I'm doing wrong? Many thanks, 
> > 
> > Geza 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to [email protected] <javascript:>. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/groups/opt_out. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7d246577-1604-45e7-9858-c48f533e8f4f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Python Elasticsearch query not returning the expected results when running subsequent calls

Reply via email to