Just noticed some typos in my code, please see the fixed one below (the 
queried index and filter terms were not consistent)


On Monday, January 13, 2014 2:08:25 PM UTC, G Kerekes wrote:
>
> Hi Honza,
>
> This is my "full" code:
>
> from elasticsearch import Elasticsearch
> import json
> import pandas as pd
> import numpy as np
> import os
>
>
>
> ### create the connection to the ES
> es = Elasticsearch("host:port", timeout=600, max_retries=10, revival_delay=0)
>
>
> ############################################################
> ####### READ IN THE ORIGINAL SURVEY DATA ###################
> ############################################################
>
> origall = es.search('survey_data' ,'primary',
>                    body = {"query": 
>                         {"bool": 
>                             {"must": 
>                                 [{
>                                     "term": {"file": "original"}
>                                 }]
>                                 }
>                         }
>                         ,"size" : "0"}
>                     )
>
> total_o = origall['hits']['total']
>
> origall_o = es.search('survey_data','primary',
>                    body = {"query": 
>                         {"bool": 
>                             {"must": 
>                                 [{
>                                     "term": {"file": "original"}
>                                 }]
>                                 }
>                         }
>                         ,"size" : 20
>
>                     }
> )
>
>
> ## force it to data frame
> orig_dict = origall_o['hits']['hits']
>
>
> ############################################################
> ####### READ IN THE NEW SURVEY DATA ########################
> ############################################################
>
>
> ### get the documents
> newall = es.search('survey_data','primary',
>                        {"query":
>                            {
>                           "bool":
>                               {
>                              "should":[
>                                 {
>                                    "term":{
>                                       "file":"destinationqc22"
>                                    }
>                                 },            
>                                 {
>                                    "term":{
>                                       "file":"destinationqc33"
>                                    }
>                                 },            
>                                 {
>                                    "term":{
>                                       "file":"destinationqc44"
>                                    }
>                                 }
>                              ]
>                           }
>                        }
>                         ,"size" : "0"                       
>                     }
>  )
>
> total_n = newall['hits']['total']
>
> newall_n = es.search('survey_data','primary',
>                        {"query":
>                            {
>                           "bool":
>                               {
>                              "should":[
>                                 {
>                                    "term":{
>                                       "file":"destinationqc22"
>                                    }
>                                 },            
>                                 {
>                                    "term":{
>                                       "file":"destinationqc33"
>                                    }
>                                 },            
>                                 {
>                                    "term":{
>                                       "file":"destinationqc44"
>                                    }
>                                 }
>                              ]
>                           }
>                        }
>                         ,"size" : 20                      
>                     }
>  )
>
>
> ## force it to data frame
> new_dict = newall_n['hits']['hits']
>
> ##
>
> print(origall_o)
> print(newall_n)
>
> print orig_dict
>
> print new_dict
>
> And then I run it I get this:
>
> >>> print(origall_o)
> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, 
> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, 
> u'timed_out': False}
> >>> print(newall_n)
> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795}, 
> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15, 
> u'timed_out': False}
> >>> 
> >>> print orig_dict
> []
> >>> 
> >>> print new_dict
> []
> >>> 
>
>
> And what I would expect is:
> origall_o total is correct (110k hits)
> newall_n total should be 84k, not sure why it has the same 110k as for the 
> origall_o
>
> And for the orig_dict and new_dict I would expect to see those 20 
> documents that I query.
>
> Many thanks for your help.
>
>
> Geza
>
>
>
> On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote:
>>
>> Hi Geza, 
>>
>> I don't understand what you mean by re-running, can you post the complete 
>> code? 
>>
>> When you do a search with size: 20, can you just print the result of 
>> the search method and see if that data is there? 
>>
>> As a side note it looks like you are trying to filter out some data, 
>> while this works with a query you will get much better performance 
>> when using a filtered query and a filter instead of a query. 
>>
>> Honza 
>>
>> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]> wrote: 
>> > Hello, 
>> > 
>> > I am querying an elasticsearch index from python. Issue 1 is that when 
>> I 
>> > change my query and rerun it, my objects in Python don't get refreshed 
>> > according to my modified query. Issue 2 is that even if I see that I 
>> got 
>> > some hits, no data comes through at all (eg I see I've got 85k hits, 
>> but 
>> > when I put it in a dictionary, it is blank). 
>> > 
>> > from elasticsearch import Elasticsearch 
>> > 
>> > es = Elasticsearch("host:port", timeout=600, max_retries=10, 
>> > revival_delay=0) 
>> > 
>> > 
>> > origall = es.search('esdata' ,'primary', 
>> >                 {"query": 
>> >                     {"bool": 
>> >                         {"must_not": 
>> >                             [{ 
>> >                                 "term": {"file": "original"} 
>> >                             }] 
>> >                             } 
>> >                     } 
>> >                     ,"size" : "0"} 
>> >                 ) 
>> > 
>> > total_o = origall['hits']['total'] 
>> > 
>> > At this stage for total_o I get 110k, which is correct. Then I rerun my 
>> > query after changing the size=0 to size=20, and if I want to have a 
>> look at 
>> > these 20 hits, I get nothing for this: 
>> > 
>> > orig = origall['hits']['hits'] 
>> > print(orig) 
>> > 
>> > Then I go back to my original query and change the must_not to must. In 
>> this 
>> > way I should get 85k hits, but after rerunning it I still get 110k in 
>> > total_o. 
>> > 
>> > It is quite random when it works and when it doesn't. Sometimes I get 
>> my 
>> > expected 85k hits, but then this get stuck and when I change my query 
>> back 
>> > to get the 110k, it would still be 85k. Also sometimes I get data in my 
>> orig 
>> > = origall['hits']['hits'], but then let's say I change the size in my 
>> query 
>> > to 0, rerun it and the origall['hits']['hits'] will still give me back 
>> the 
>> > data. 
>> > 
>> > I use Anaconda, but tried also in Pycharm and the default Python IDLE, 
>> these 
>> > behave the same. Tried to create separate ES connections for all my 
>> queries, 
>> > doesn't help. Played around with cache, but no luck. 
>> > 
>> > I'm running it on a 64 bit, Windows 7 machine. 
>> > 
>> > Any idea what I'm doing wrong? Many thanks, 
>> > 
>> > Geza 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups 
>> > "elasticsearch" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an 
>> > email to [email protected]. 
>> > To view this discussion on the web visit 
>> > 
>> https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
>>  
>>
>> > For more options, visit https://groups.google.com/groups/opt_out. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a1eed86-eb4f-4459-93d1-a45ed499cc8a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to