Just noticed some typos in my code, please see the fixed one below (the
queried index and filter terms were not consistent)
On Monday, January 13, 2014 2:08:25 PM UTC, G Kerekes wrote:
>
> Hi Honza,
>
> This is my "full" code:
>
> from elasticsearch import Elasticsearch
> import json
> import pandas as pd
> import numpy as np
> import os
>
>
>
> ### create the connection to the ES
> es = Elasticsearch("host:port", timeout=600, max_retries=10, revival_delay=0)
>
>
> ############################################################
> ####### READ IN THE ORIGINAL SURVEY DATA ###################
> ############################################################
>
> origall = es.search('survey_data' ,'primary',
> body = {"query":
> {"bool":
> {"must":
> [{
> "term": {"file": "original"}
> }]
> }
> }
> ,"size" : "0"}
> )
>
> total_o = origall['hits']['total']
>
> origall_o = es.search('survey_data','primary',
> body = {"query":
> {"bool":
> {"must":
> [{
> "term": {"file": "original"}
> }]
> }
> }
> ,"size" : 20
>
> }
> )
>
>
> ## force it to data frame
> orig_dict = origall_o['hits']['hits']
>
>
> ############################################################
> ####### READ IN THE NEW SURVEY DATA ########################
> ############################################################
>
>
> ### get the documents
> newall = es.search('survey_data','primary',
> {"query":
> {
> "bool":
> {
> "should":[
> {
> "term":{
> "file":"destinationqc22"
> }
> },
> {
> "term":{
> "file":"destinationqc33"
> }
> },
> {
> "term":{
> "file":"destinationqc44"
> }
> }
> ]
> }
> }
> ,"size" : "0"
> }
> )
>
> total_n = newall['hits']['total']
>
> newall_n = es.search('survey_data','primary',
> {"query":
> {
> "bool":
> {
> "should":[
> {
> "term":{
> "file":"destinationqc22"
> }
> },
> {
> "term":{
> "file":"destinationqc33"
> }
> },
> {
> "term":{
> "file":"destinationqc44"
> }
> }
> ]
> }
> }
> ,"size" : 20
> }
> )
>
>
> ## force it to data frame
> new_dict = newall_n['hits']['hits']
>
> ##
>
> print(origall_o)
> print(newall_n)
>
> print orig_dict
>
> print new_dict
>
> And then I run it I get this:
>
> >>> print(origall_o)
> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
> u'timed_out': False}
> >>> print(newall_n)
> {u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
> u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
> u'timed_out': False}
> >>>
> >>> print orig_dict
> []
> >>>
> >>> print new_dict
> []
> >>>
>
>
> And what I would expect is:
> origall_o total is correct (110k hits)
> newall_n total should be 84k, not sure why it has the same 110k as for the
> origall_o
>
> And for the orig_dict and new_dict I would expect to see those 20
> documents that I query.
>
> Many thanks for your help.
>
>
> Geza
>
>
>
> On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote:
>>
>> Hi Geza,
>>
>> I don't understand what you mean by re-running, can you post the complete
>> code?
>>
>> When you do a search with size: 20, can you just print the result of
>> the search method and see if that data is there?
>>
>> As a side note it looks like you are trying to filter out some data,
>> while this works with a query you will get much better performance
>> when using a filtered query and a filter instead of a query.
>>
>> Honza
>>
>> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]> wrote:
>> > Hello,
>> >
>> > I am querying an elasticsearch index from python. Issue 1 is that when
>> I
>> > change my query and rerun it, my objects in Python don't get refreshed
>> > according to my modified query. Issue 2 is that even if I see that I
>> got
>> > some hits, no data comes through at all (eg I see I've got 85k hits,
>> but
>> > when I put it in a dictionary, it is blank).
>> >
>> > from elasticsearch import Elasticsearch
>> >
>> > es = Elasticsearch("host:port", timeout=600, max_retries=10,
>> > revival_delay=0)
>> >
>> >
>> > origall = es.search('esdata' ,'primary',
>> > {"query":
>> > {"bool":
>> > {"must_not":
>> > [{
>> > "term": {"file": "original"}
>> > }]
>> > }
>> > }
>> > ,"size" : "0"}
>> > )
>> >
>> > total_o = origall['hits']['total']
>> >
>> > At this stage for total_o I get 110k, which is correct. Then I rerun my
>> > query after changing the size=0 to size=20, and if I want to have a
>> look at
>> > these 20 hits, I get nothing for this:
>> >
>> > orig = origall['hits']['hits']
>> > print(orig)
>> >
>> > Then I go back to my original query and change the must_not to must. In
>> this
>> > way I should get 85k hits, but after rerunning it I still get 110k in
>> > total_o.
>> >
>> > It is quite random when it works and when it doesn't. Sometimes I get
>> my
>> > expected 85k hits, but then this get stuck and when I change my query
>> back
>> > to get the 110k, it would still be 85k. Also sometimes I get data in my
>> orig
>> > = origall['hits']['hits'], but then let's say I change the size in my
>> query
>> > to 0, rerun it and the origall['hits']['hits'] will still give me back
>> the
>> > data.
>> >
>> > I use Anaconda, but tried also in Pycharm and the default Python IDLE,
>> these
>> > behave the same. Tried to create separate ES connections for all my
>> queries,
>> > doesn't help. Played around with cache, but no luck.
>> >
>> > I'm running it on a 64 bit, Windows 7 machine.
>> >
>> > Any idea what I'm doing wrong? Many thanks,
>> >
>> > Geza
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to [email protected].
>> > To view this discussion on the web visit
>> >
>> https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
>>
>>
>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2a1eed86-eb4f-4459-93d1-a45ed499cc8a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.