As I am looking through the code, I am  thinking of the following approach

1. Write a plugin that will accept an encoded string containing the doc ids 
instead of the array of ids
2. Add a custom IdsFilterParser that will decode this string to a bit set 
and pass it downstream.

But it seems that the TermsFilter also needs to be customized (or a custom 
TermsFilter added) as the TermsFilter.getDocIdSet is the one that needs to 
be overridden/modified to generate the DocidSet from a set of doc ids 
rather than from a list of TermsAndFields as it is now.

Is this the right approach? Any pointers?

Thanks,
Shantanu Sen


On Wednesday, June 11, 2014 9:26:27 PM UTC-7, Shantanu Sen wrote:
>
> Hi,
>
> We are currently using Lucene and are exploring Elasticsearch for scaling. 
> We have a requirement to filter queries based on doc id and the set of docs 
> to be filtered can be quite large e.g. out of a corpus of 10 million 
> documents, user can choose a set of 5 million and run a query targeting 
> that subset. Hence we need to pass in a set of 5 million doc ids so that 
> the query can run only on those rather than the full index.
>
> I am planning to use a mapped _id field that will be set during index 
> mapping and then use a filtered query with IdsFilterBuilder to generate a 
> filtered query. The issue is that the API takes a list of strings and hence 
> will not scale - ideally we would like to pass in a bit set containing all 
> the doc ids.
>
> We will be using the java api. What is the best way to approach this 
> issue? I understand that we would need to write a custom API that will 
> accept a bit set. If we write a plugin, can be access the internal APIs of 
> Elasticsearch and hence not use the SearchRequestBuilder? 
>
> Is a plugin the right approach? Any pointers as to where to start?
>
> Thanks,
> Shantanu Sen
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/35392141-78e9-4451-82af-08e14111a906%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to