I am sorry, I don't see why it should match at all - you are searching for different things in different indices.
On Mon, Jan 13, 2014 at 3:08 PM, G Kerekes <[email protected]> wrote: > Hi Honza, > > This is my "full" code: > > from elasticsearch import Elasticsearch > import json > import pandas as pd > import numpy as np > import os > > > > ### create the connection to the ES > es = Elasticsearch("host:port", timeout=600, max_retries=10, > revival_delay=0) > > > ############################################################ > ####### READ IN THE ORIGINAL SURVEY DATA ################### > ############################################################ > > origall = es.search('survey_data' ,'primary', > body = {"query": > {"bool": > {"must": > [{ > "term": {"file": "original"} > }] > } > } > ,"size" : "0"} > ) > > total_o = origall['hits']['total'] > > origall_o = es.search('tns_survey_data','primary', > body = {"query": > {"bool": > {"must": > [{ > "term": {"file": "original_amit2"} > }] > } > } > ,"size" : 20 > > } > ) > > > ## force it to data frame > orig_dict = origall_o['hits']['hits'] > > > ############################################################ > ####### READ IN THE NEW SURVEY DATA ######################## > ############################################################ > > > ### get the documents > newall = es.search('survey_data','primary', > {"query": > { > "bool": > { > "should":[ > { > "term":{ > "file":"destinationqc22" > } > }, > { > "term":{ > "file":"destinationqc33" > } > }, > { > "term":{ > "file":"destinationqc44" > } > } > ] > } > } > ,"size" : "0" > } > ) > > total_n = newall['hits']['total'] > > newall_n = es.search('tns_survey_data','primary', > {"query": > { > "bool": > { > "should":[ > { > "term":{ > "file":"destinationqc22" > } > }, > { > "term":{ > "file":"destinationqc33" > } > }, > { > "term":{ > "file":"destinationqc44" > } > } > ] > } > } > ,"size" : 20 > } > ) > > > ## force it to data frame > new_dict = newall_n['hits']['hits'] > > ## > > print(origall_o) > print(newall_n) > > print orig_dict > > print new_dict > > And then I run it I get this: > >>>> print(origall_o) > {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, > u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, > u'timed_out': False} >>>> print(newall_n) > {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, > u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, > u'timed_out': False} >>>> >>>> print orig_dict > [] >>>> >>>> print new_dict > [] >>>> > > > And what I would expect is: > origall_o total is correct (110k hits) > newall_n total should be 84k, not sure why it has the same 110k as for the > origall_o > > And for the orig_dict and new_dict I would expect to see those 20 documents > that I query. > > Many thanks for your help. > > > Geza > > > > On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote: >> >> Hi Geza, >> >> I don't understand what you mean by re-running, can you post the complete >> code? >> >> When you do a search with size: 20, can you just print the result of >> the search method and see if that data is there? >> >> As a side note it looks like you are trying to filter out some data, >> while this works with a query you will get much better performance >> when using a filtered query and a filter instead of a query. >> >> Honza >> >> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]> wrote: >> > Hello, >> > >> > I am querying an elasticsearch index from python. Issue 1 is that when I >> > change my query and rerun it, my objects in Python don't get refreshed >> > according to my modified query. Issue 2 is that even if I see that I got >> > some hits, no data comes through at all (eg I see I've got 85k hits, but >> > when I put it in a dictionary, it is blank). >> > >> > from elasticsearch import Elasticsearch >> > >> > es = Elasticsearch("host:port", timeout=600, max_retries=10, >> > revival_delay=0) >> > >> > >> > origall = es.search('esdata' ,'primary', >> > {"query": >> > {"bool": >> > {"must_not": >> > [{ >> > "term": {"file": "original"} >> > }] >> > } >> > } >> > ,"size" : "0"} >> > ) >> > >> > total_o = origall['hits']['total'] >> > >> > At this stage for total_o I get 110k, which is correct. Then I rerun my >> > query after changing the size=0 to size=20, and if I want to have a look >> > at >> > these 20 hits, I get nothing for this: >> > >> > orig = origall['hits']['hits'] >> > print(orig) >> > >> > Then I go back to my original query and change the must_not to must. In >> > this >> > way I should get 85k hits, but after rerunning it I still get 110k in >> > total_o. >> > >> > It is quite random when it works and when it doesn't. Sometimes I get my >> > expected 85k hits, but then this get stuck and when I change my query >> > back >> > to get the 110k, it would still be 85k. Also sometimes I get data in my >> > orig >> > = origall['hits']['hits'], but then let's say I change the size in my >> > query >> > to 0, rerun it and the origall['hits']['hits'] will still give me back >> > the >> > data. >> > >> > I use Anaconda, but tried also in Pycharm and the default Python IDLE, >> > these >> > behave the same. Tried to create separate ES connections for all my >> > queries, >> > doesn't help. Played around with cache, but no luck. >> > >> > I'm running it on a 64 bit, Windows 7 machine. >> > >> > Any idea what I'm doing wrong? Many thanks, >> > >> > Geza >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "elasticsearch" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an >> > email to [email protected]. >> > To view this discussion on the web visit >> > >> > https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com. >> > For more options, visit https://groups.google.com/groups/opt_out. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/7d246577-1604-45e7-9858-c48f533e8f4f%40googlegroups.com. > > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABfdDiqqfWCYEC-m3_0j-JYYmkmMTF-BfbKWniJROO-P%2B%2BdJCQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
