Sorry Honza, I tried to somewhat anonimize my code but I was not consistent. Basically I always query the same index, and my filter terms are also consistent (original = original_amit2 and tns_survey_data = survey_data).
2014/1/13 G Kerekes <[email protected]> > Just noticed some typos in my code, please see the fixed one below (the > queried index and filter terms were not consistent) > > > On Monday, January 13, 2014 2:08:25 PM UTC, G Kerekes wrote: > >> Hi Honza, >> >> This is my "full" code: >> >> from elasticsearch import Elasticsearch >> import json >> import pandas as pd >> import numpy as np >> import os >> >> >> >> ### create the connection to the ES >> es = Elasticsearch("host:port", timeout=600, max_retries=10, revival_delay=0) >> >> >> ############################################################ >> ####### READ IN THE ORIGINAL SURVEY DATA ################### >> ############################## >> ############################## >> >> origall = es.search('survey_data' ,'primary', >> body = {"query": >> {"bool": >> {"must": >> [{ >> "term": {"file": "original"} >> }] >> } >> } >> ,"size" : "0"} >> ) >> >> total_o = origall['hits']['total'] >> >> origall_o = es.search('survey_data','primary', >> body = {"query": >> {"bool": >> {"must": >> [{ >> "term": {"file": "original"} >> }] >> } >> } >> ,"size" : 20 >> >> } >> ) >> >> >> ## force it to data frame >> orig_dict = origall_o['hits']['hits'] >> >> >> ############################################################ >> ####### READ IN THE NEW SURVEY DATA ######################## >> ############################################################ >> >> >> ### get the documents >> newall = es.search('survey_data','primary', >> {"query": >> { >> "bool": >> { >> "should":[ >> { >> "term":{ >> "file":"destinationqc22" >> } >> }, >> { >> "term":{ >> "file":"destinationqc33" >> } >> }, >> { >> "term":{ >> "file":"destinationqc44" >> } >> } >> ] >> } >> } >> ,"size" : "0" >> } >> ) >> >> total_n = newall['hits']['total'] >> >> newall_n = es.search('survey_data','primary', >> {"query": >> { >> "bool": >> { >> "should":[ >> { >> "term":{ >> "file":"destinationqc22" >> } >> }, >> { >> "term":{ >> "file":"destinationqc33" >> } >> }, >> { >> "term":{ >> "file":"destinationqc44" >> } >> } >> ] >> } >> } >> ,"size" : 20 >> } >> ) >> >> >> ## force it to data frame >> new_dict = newall_n['hits']['hits'] >> >> ## >> >> print(origall_o) >> print(newall_n) >> >> print orig_dict >> >> print new_dict >> >> And then I run it I get this: >> >> >>> print(origall_o) >> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, >> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, >> u'timed_out': False} >> >>> print(newall_n) >> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, >> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, >> u'timed_out': False} >> >>> >> >>> print orig_dict >> [] >> >>> >> >>> print new_dict >> [] >> >>> >> >> >> And what I would expect is: >> origall_o total is correct (110k hits) >> newall_n total should be 84k, not sure why it has the same 110k as for >> the origall_o >> >> And for the orig_dict and new_dict I would expect to see those 20 >> documents that I query. >> >> Many thanks for your help. >> >> >> Geza >> >> >> >> On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote: >>> >>> Hi Geza, >>> >>> I don't understand what you mean by re-running, can you post the >>> complete code? >>> >>> When you do a search with size: 20, can you just print the result of >>> the search method and see if that data is there? >>> >>> As a side note it looks like you are trying to filter out some data, >>> while this works with a query you will get much better performance >>> when using a filtered query and a filter instead of a query. >>> >>> Honza >>> >>> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]> wrote: >>> > Hello, >>> > >>> > I am querying an elasticsearch index from python. Issue 1 is that when >>> I >>> > change my query and rerun it, my objects in Python don't get refreshed >>> > according to my modified query. Issue 2 is that even if I see that I >>> got >>> > some hits, no data comes through at all (eg I see I've got 85k hits, >>> but >>> > when I put it in a dictionary, it is blank). >>> > >>> > from elasticsearch import Elasticsearch >>> > >>> > es = Elasticsearch("host:port", timeout=600, max_retries=10, >>> > revival_delay=0) >>> > >>> > >>> > origall = es.search('esdata' ,'primary', >>> > {"query": >>> > {"bool": >>> > {"must_not": >>> > [{ >>> > "term": {"file": "original"} >>> > }] >>> > } >>> > } >>> > ,"size" : "0"} >>> > ) >>> > >>> > total_o = origall['hits']['total'] >>> > >>> > At this stage for total_o I get 110k, which is correct. Then I rerun >>> my >>> > query after changing the size=0 to size=20, and if I want to have a >>> look at >>> > these 20 hits, I get nothing for this: >>> > >>> > orig = origall['hits']['hits'] >>> > print(orig) >>> > >>> > Then I go back to my original query and change the must_not to must. >>> In this >>> > way I should get 85k hits, but after rerunning it I still get 110k in >>> > total_o. >>> > >>> > It is quite random when it works and when it doesn't. Sometimes I get >>> my >>> > expected 85k hits, but then this get stuck and when I change my query >>> back >>> > to get the 110k, it would still be 85k. Also sometimes I get data in >>> my orig >>> > = origall['hits']['hits'], but then let's say I change the size in my >>> query >>> > to 0, rerun it and the origall['hits']['hits'] will still give me back >>> the >>> > data. >>> > >>> > I use Anaconda, but tried also in Pycharm and the default Python IDLE, >>> these >>> > behave the same. Tried to create separate ES connections for all my >>> queries, >>> > doesn't help. Played around with cache, but no luck. >>> > >>> > I'm running it on a 64 bit, Windows 7 machine. >>> > >>> > Any idea what I'm doing wrong? Many thanks, >>> > >>> > Geza >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups >>> > "elasticsearch" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an >>> > email to [email protected]. >>> > To view this discussion on the web visit >>> > https://groups.google.com/d/msgid/elasticsearch/adf4f92a- >>> 59f3-4189-ab87-8a2c13de7022%40googlegroups.com. >>> > For more options, visit https://groups.google.com/groups/opt_out. >>> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/Ld5XwSVP6ik/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/2a1eed86-eb4f-4459-93d1-a45ed499cc8a%40googlegroups.com > . > > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEJuwWXhtXEPxVTPuR4x4HHV0ZO3bMsSxMeK7ZfvNHSWBSkyGw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
