Hi Honza,
This is my "full" code:
from elasticsearch import Elasticsearch
import json
import pandas as pd
import numpy as np
import os
### create the connection to the ES
es = Elasticsearch("host:port", timeout=600, max_retries=10, revival_delay=0)
############################################################
####### READ IN THE ORIGINAL SURVEY DATA ###################
############################################################
origall = es.search('survey_data' ,'primary',
body = {"query":
{"bool":
{"must":
[{
"term": {"file": "original"}
}]
}
}
,"size" : "0"}
)
total_o = origall['hits']['total']
origall_o = es.search('tns_survey_data','primary',
body = {"query":
{"bool":
{"must":
[{
"term": {"file": "original_amit2"}
}]
}
}
,"size" : 20
}
)
## force it to data frame
orig_dict = origall_o['hits']['hits']
############################################################
####### READ IN THE NEW SURVEY DATA ########################
############################################################
### get the documents
newall = es.search('survey_data','primary',
{"query":
{
"bool":
{
"should":[
{
"term":{
"file":"destinationqc22"
}
},
{
"term":{
"file":"destinationqc33"
}
},
{
"term":{
"file":"destinationqc44"
}
}
]
}
}
,"size" : "0"
}
)
total_n = newall['hits']['total']
newall_n = es.search('tns_survey_data','primary',
{"query":
{
"bool":
{
"should":[
{
"term":{
"file":"destinationqc22"
}
},
{
"term":{
"file":"destinationqc33"
}
},
{
"term":{
"file":"destinationqc44"
}
}
]
}
}
,"size" : 20
}
)
## force it to data frame
new_dict = newall_n['hits']['hits']
##
print(origall_o)
print(newall_n)
print orig_dict
print new_dict
And then I run it I get this:
>>> print(origall_o)
{u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
u'timed_out': False}
>>> print(newall_n)
{u'hits': {u'hits': [], u'total': 110950, u'max_score': 0.7038795},
u'_shards': {u'successful': 3, u'failed': 0, u'total': 3}, u'took': 15,
u'timed_out': False}
>>>
>>> print orig_dict
[]
>>>
>>> print new_dict
[]
>>>
And what I would expect is:
origall_o total is correct (110k hits)
newall_n total should be 84k, not sure why it has the same 110k as for the
origall_o
And for the orig_dict and new_dict I would expect to see those 20 documents
that I query.
Many thanks for your help.
Geza
On Monday, January 13, 2014 12:16:53 PM UTC, Honza Král wrote:
>
> Hi Geza,
>
> I don't understand what you mean by re-running, can you post the complete
> code?
>
> When you do a search with size: 20, can you just print the result of
> the search method and see if that data is there?
>
> As a side note it looks like you are trying to filter out some data,
> while this works with a query you will get much better performance
> when using a filtered query and a filter instead of a query.
>
> Honza
>
> On Mon, Jan 13, 2014 at 10:38 AM, G Kerekes <[email protected]<javascript:>>
> wrote:
> > Hello,
> >
> > I am querying an elasticsearch index from python. Issue 1 is that when I
> > change my query and rerun it, my objects in Python don't get refreshed
> > according to my modified query. Issue 2 is that even if I see that I got
> > some hits, no data comes through at all (eg I see I've got 85k hits, but
> > when I put it in a dictionary, it is blank).
> >
> > from elasticsearch import Elasticsearch
> >
> > es = Elasticsearch("host:port", timeout=600, max_retries=10,
> > revival_delay=0)
> >
> >
> > origall = es.search('esdata' ,'primary',
> > {"query":
> > {"bool":
> > {"must_not":
> > [{
> > "term": {"file": "original"}
> > }]
> > }
> > }
> > ,"size" : "0"}
> > )
> >
> > total_o = origall['hits']['total']
> >
> > At this stage for total_o I get 110k, which is correct. Then I rerun my
> > query after changing the size=0 to size=20, and if I want to have a look
> at
> > these 20 hits, I get nothing for this:
> >
> > orig = origall['hits']['hits']
> > print(orig)
> >
> > Then I go back to my original query and change the must_not to must. In
> this
> > way I should get 85k hits, but after rerunning it I still get 110k in
> > total_o.
> >
> > It is quite random when it works and when it doesn't. Sometimes I get my
> > expected 85k hits, but then this get stuck and when I change my query
> back
> > to get the 110k, it would still be 85k. Also sometimes I get data in my
> orig
> > = origall['hits']['hits'], but then let's say I change the size in my
> query
> > to 0, rerun it and the origall['hits']['hits'] will still give me back
> the
> > data.
> >
> > I use Anaconda, but tried also in Pycharm and the default Python IDLE,
> these
> > behave the same. Tried to create separate ES connections for all my
> queries,
> > doesn't help. Played around with cache, but no luck.
> >
> > I'm running it on a 64 bit, Windows 7 machine.
> >
> > Any idea what I'm doing wrong? Many thanks,
> >
> > Geza
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an
> > email to [email protected] <javascript:>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/elasticsearch/adf4f92a-59f3-4189-ab87-8a2c13de7022%40googlegroups.com.
>
>
> > For more options, visit https://groups.google.com/groups/opt_out.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7d246577-1604-45e7-9858-c48f533e8f4f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.