I am sorry, I don't see why it should match at all - you are searching
for different things in different indices.

On Mon, Jan 13, 2014 at 3:08 PM, G Kerekes <[email protected]> wrote:
> Hi Honza,
>
> This is my "full" code:
>
> from elasticsearch import Elasticsearch
> import json
> import pandas as pd
> import numpy as np
> import os
>
>
>
> ### create the connection to the ES
> es = Elasticsearch("host:port", timeout=600, max_retries=10,
> revival_delay=0)
>
>
> ############################################################
> ####### READ IN THE ORIGINAL SURVEY DATA ###################
> ############################################################
>
> origall = es.search('survey_data' ,'primary',
>                    body = {"query":
>                         {"bool":
>                             {"must":
>                                 [{
>                                     "term": {"file": "original"}
>                                 }]
>                                 }
>                         }
>                         ,"size" : "0"}
>                     )
>
> total_o = origall['hits']['total']
>
> origall_o = es.search('tns_survey_data','primary',
>                    body = {"query":
>                         {"bool":
>                             {"must":
>                                 [{
>                                     "term": {"file": "original_amit2"}
>                                 }]
>                                 }
>                         }
>                         ,"size" : 20
>
>                     }
> )
>
>
> ## force it to data frame
> orig_dict = origall_o['hits']['hits']
>
>
> ############################################################
> ####### READ IN THE NEW SURVEY DATA ########################
> ############################################################
>
>
> ### get the documents
> newall = es.search('survey_data','primary',
>                        {"query":
>                            {
>                           "bool":
>                               {
>                              "should":[
>                                 {
>                                    "term":{
>                                       "file":"destinationqc22"
>                                    }
>                                 },
>                                 {
>                                    "term":{
>                                       "file":"destinationqc33"
>                                    }
>                                 },
>                                 {
>                                    "term":{
>                                       "file":"destinationqc44"
>                                    }
>                                 }
>                              ]
>                           }
>                        }
>                         ,"size" : "0"
>                     }
>  )
>
> total_n = newall['hits']['total']
>
> newall_n = es.search('tns_survey_data','primary',
>                        {"query":
>                            {
>                           "bool":
>                               {
>                              "should":[
>                                 {
>                                    "term":{
>                                       "file":"destinationqc22"
>                                    }
>                                 },
>                                 {
>                                    "term":{
>                                       "file":"destinationqc33"
>                                    }
>                                 },
>                                 {
>                                    "term":{
>                                       "file":"destinationqc44"
>                                    }
>                                 }
>                              ]
>                           }
>                        }
>                         ,"size" : 20
>                     }
>  )
>
>
> ## force it to data frame
> new_dict = newall_n['hits']['hits']
>
> ##
>
> print(origall_o)
> print(newall_n)
>
> print orig_dict
>
> print new_dict
>
> And then I run it I get this:
>
>>>> print(origall_o)
> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
> u'timed_out': False}
>>>> print(newall_n)
> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
> u'timed_out': False}
>>>>
>>>> print orig_dict
> []
>>>>
>>>> print new_dict
> []
>>>>
>
>
> And what I would expect is:
> origall_o total is correct (110k hits)
> newall_n total should be 84k, not sure why it has the same 110k as for the
> origall_o
>
> And for the orig_dict and new_dict I would expect to see those 20 documents
> that I query.
>
> Many thanks for your help.
>
>
> Geza
>
>
>
> On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote:
>>
>> Hi Geza,
>>
>> I don't understand what you mean by re-running, can you post the complete
>> code?
>>
>> When you do a search with size: 20, can you just print the result of
>> the search method and see if that data is there?
>>
>> As a side note it looks like you are trying to filter out some data,
>> while this works with a query you will get much better performance
>> when using a filtered query and a filter instead of a query.
>>
>> Honza
>>
>> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]> wrote:
>> > Hello,
>> >
>> > I am querying an elasticsearch index from python. Issue 1 is that when I
>> > change my query and rerun it, my objects in Python don't get refreshed
>> > according to my modified query. Issue 2 is that even if I see that I got
>> > some hits, no data comes through at all (eg I see I've got 85k hits, but
>> > when I put it in a dictionary, it is blank).
>> >
>> > from elasticsearch import Elasticsearch
>> >
>> > es = Elasticsearch("host:port", timeout=600, max_retries=10,
>> > revival_delay=0)
>> >
>> >
>> > origall = es.search('esdata' ,'primary',
>> >                 {"query":
>> >                     {"bool":
>> >                         {"must_not":
>> >                             [{
>> >                                 "term": {"file": "original"}
>> >                             }]
>> >                             }
>> >                     }
>> >                     ,"size" : "0"}
>> >                 )
>> >
>> > total_o = origall['hits']['total']
>> >
>> > At this stage for total_o I get 110k, which is correct. Then I rerun my
>> > query after changing the size=0 to size=20, and if I want to have a look
>> > at
>> > these 20 hits, I get nothing for this:
>> >
>> > orig = origall['hits']['hits']
>> > print(orig)
>> >
>> > Then I go back to my original query and change the must_not to must. In
>> > this
>> > way I should get 85k hits, but after rerunning it I still get 110k in
>> > total_o.
>> >
>> > It is quite random when it works and when it doesn't. Sometimes I get my
>> > expected 85k hits, but then this get stuck and when I change my query
>> > back
>> > to get the 110k, it would still be 85k. Also sometimes I get data in my
>> > orig
>> > = origall['hits']['hits'], but then let's say I change the size in my
>> > query
>> > to 0, rerun it and the origall['hits']['hits'] will still give me back
>> > the
>> > data.
>> >
>> > I use Anaconda, but tried also in Pycharm and the default Python IDLE,
>> > these
>> > behave the same. Tried to create separate ES connections for all my
>> > queries,
>> > doesn't help. Played around with cache, but no luck.
>> >
>> > I'm running it on a 64 bit, Windows 7 machine.
>> >
>> > Any idea what I'm doing wrong? Many thanks,
>> >
>> > Geza
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to [email protected].
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7d246577-1604-45e7-9858-c48f533e8f4f%40googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CABfdDiqqfWCYEC-m3_0j-JYYmkmMTF-BfbKWniJROO-P%2B%2BdJCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to