Hi,

Appreciate your continued assistance. :) Thanks,

Disclaimer: I am yet to sufficiently understand ES sources so as to depict 
my scenario completely. Some info' below may be conjecture.

I would have a corpus of 50M docs (actually lot more, but for testing now) 
out of which I would have say, upto, 1M DocIds to be used as a filter. This 
set of 1M docs can be different for different use cases, the point being, 
upto 1M docIds can form one logical set of documents for filtering results. 
If I use a simple IdsFilter from ES Java API, I would have to keep adding 
these 1M docs to the List implementation internally, and I have a feeling 
it may not scale very well as they may change per use case and per some 
combinations internal to a single use case also.

As I debug the code, the IdsFilter will be converted to a Lucene filter. 
Lucene filters, on the other hand, operate on a docId bitset type. That 
gels very well with my requirement, since I can scale with BitSets (I 
assume).

If I can find a way to directly plug this BitSet as a Lucene Filter to the 
Lucene search() call bypassing the ES filters using, I dont know, may some 
sort of a plugin, I believe that may support my cause. I assume I may not 
get to use the Filter cache from ES but probably I can cache these BitSets 
for subsequent use. 

Please let me know. And thanks!

Thanks,
Sandeep


On Saturday, 5 July 2014 01:40:55 UTC+5:30, Jörg Prante wrote:
>
> What I understand is a TermsFilter is required
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html
>
> and the source of the terms is a DB. That is no problem. The plan is: 
> fetch the terms from the DB, build the query (either Java API or JSON) and 
> execute it.
>
> What I don't understand is the part with the "quick mapping", Lucene, and 
> the doc ids. Lucene doc IDs are not reliable and are not exposed by 
> Elasticsearch, Elasticsearch uses it's own document identifiers which are 
> stable and augmented with info about the index type they belong to, in 
> order to make them unique. But I do not understand why this is important in 
> this context.
>
> Elasticsearch API uses query builders and filter builders to build search 
> requests . A "quick mapping" is just fetching the terms from the DB as a 
> string array before this API is called.
>
> I also do not understand the role of the number "1M", is this the number 
> of fields, or the number of terms? Is it a total number or a number per 
> query?
>
> Did I misunderstand anything more? I am not really sure what is the 
> challenge...
>
> Jörg
>
>
>
> On Fri, Jul 4, 2014 at 8:55 PM, 'Sandeep Ramesh Khanzode' via 
> elasticsearch <[email protected] <javascript:>> wrote:
>
>> Hi,
>>
>> Just to give some background. I will have a large-ish corpus of more than 
>> 100M documents indexed. The filters that I want to apply will be on a field 
>> that is not indexed. I mean, I prefer to not have them indexed in ES/Lucene 
>> since they will be frequently changing. So, for that, I will be maintaining 
>> them elsewhere, like a DB etc.
>>
>> Everytime I have a query, I would want to filter the results by those 
>> fields that are not indexed in Lucene. And I am guessing that number may 
>> well be more than 1M. In that case, I think, since we will maintain some 
>> sort of TermsFilter, it may not scale linearly. What I would want to do, 
>> preferably, is to have a hook inside the ES query, so that I can, at query 
>> time, inject the required filter values. Since the filter values have to be 
>> recognized by Lucene, and I will not be indexing them, I will need to do 
>> some quick mapping to get those fields and map them quickly to some field 
>> in Lucene that I can save in the filter. I am not sure whether we can 
>> access and set Lucene DocIDs in the filter or whether they are even exposed 
>> in ES.
>>
>> Please assist with this query. Thanks,
>>
>> Thanks,
>> Sandeep
>>
>>
>> On Thursday, 3 July 2014 21:33:45 UTC+5:30, Jörg Prante wrote:
>>
>>> Maybe I do not fully understand, but in a client, you can fetch the 
>>> required filter terms from any external source before a JSON query is 
>>> constructed?
>>>
>>> Can you give an example what you want to achieve?
>>>
>>> Jörg
>>>
>>>
>>> On Thu, Jul 3, 2014 at 3:34 PM, 'Sandeep Ramesh Khanzode' via 
>>> elasticsearch <[email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am new to ES and I have the following requirement:
>>>> I need to specify a list of strings as a filter that applies to a 
>>>> specific field in the document. Like what a filter does, but instead of 
>>>> sending them on the query, I would like them to be populated from an 
>>>> external sources, like a DB or something. Can you please guide me to the 
>>>> relevant examples or references to achieve this on v1.1.2? 
>>>>
>>>> Thanks,
>>>> Sandeep
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/0093d97d-0f47-48e9-ba19-85b0850eda89%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/0093d97d-0f47-48e9-ba19-85b0850eda89%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/513172cd-9507-4e96-b456-498c98c3b8c9%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/513172cd-9507-4e96-b456-498c98c3b8c9%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f2ec45c7-8980-4005-9e1b-fc9a6aa422e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to