Hi, Appreciate your continued assistance. :) Thanks,
Disclaimer: I am yet to sufficiently understand ES sources so as to depict my scenario completely. Some info' below may be conjecture. I would have a corpus of 50M docs (actually lot more, but for testing now) out of which I would have say, upto, 1M DocIds to be used as a filter. This set of 1M docs can be different for different use cases, the point being, upto 1M docIds can form one logical set of documents for filtering results. If I use a simple IdsFilter from ES Java API, I would have to keep adding these 1M docs to the List implementation internally, and I have a feeling it may not scale very well as they may change per use case and per some combinations internal to a single use case also. As I debug the code, the IdsFilter will be converted to a Lucene filter. Lucene filters, on the other hand, operate on a docId bitset type. That gels very well with my requirement, since I can scale with BitSets (I assume). If I can find a way to directly plug this BitSet as a Lucene Filter to the Lucene search() call bypassing the ES filters using, I dont know, may some sort of a plugin, I believe that may support my cause. I assume I may not get to use the Filter cache from ES but probably I can cache these BitSets for subsequent use. Please let me know. And thanks! Thanks, Sandeep On Saturday, 5 July 2014 01:40:55 UTC+5:30, Jörg Prante wrote: > > What I understand is a TermsFilter is required > > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html > > and the source of the terms is a DB. That is no problem. The plan is: > fetch the terms from the DB, build the query (either Java API or JSON) and > execute it. > > What I don't understand is the part with the "quick mapping", Lucene, and > the doc ids. Lucene doc IDs are not reliable and are not exposed by > Elasticsearch, Elasticsearch uses it's own document identifiers which are > stable and augmented with info about the index type they belong to, in > order to make them unique. But I do not understand why this is important in > this context. > > Elasticsearch API uses query builders and filter builders to build search > requests . A "quick mapping" is just fetching the terms from the DB as a > string array before this API is called. > > I also do not understand the role of the number "1M", is this the number > of fields, or the number of terms? Is it a total number or a number per > query? > > Did I misunderstand anything more? I am not really sure what is the > challenge... > > Jörg > > > > On Fri, Jul 4, 2014 at 8:55 PM, 'Sandeep Ramesh Khanzode' via > elasticsearch <[email protected] <javascript:>> wrote: > >> Hi, >> >> Just to give some background. I will have a large-ish corpus of more than >> 100M documents indexed. The filters that I want to apply will be on a field >> that is not indexed. I mean, I prefer to not have them indexed in ES/Lucene >> since they will be frequently changing. So, for that, I will be maintaining >> them elsewhere, like a DB etc. >> >> Everytime I have a query, I would want to filter the results by those >> fields that are not indexed in Lucene. And I am guessing that number may >> well be more than 1M. In that case, I think, since we will maintain some >> sort of TermsFilter, it may not scale linearly. What I would want to do, >> preferably, is to have a hook inside the ES query, so that I can, at query >> time, inject the required filter values. Since the filter values have to be >> recognized by Lucene, and I will not be indexing them, I will need to do >> some quick mapping to get those fields and map them quickly to some field >> in Lucene that I can save in the filter. I am not sure whether we can >> access and set Lucene DocIDs in the filter or whether they are even exposed >> in ES. >> >> Please assist with this query. Thanks, >> >> Thanks, >> Sandeep >> >> >> On Thursday, 3 July 2014 21:33:45 UTC+5:30, Jörg Prante wrote: >> >>> Maybe I do not fully understand, but in a client, you can fetch the >>> required filter terms from any external source before a JSON query is >>> constructed? >>> >>> Can you give an example what you want to achieve? >>> >>> Jörg >>> >>> >>> On Thu, Jul 3, 2014 at 3:34 PM, 'Sandeep Ramesh Khanzode' via >>> elasticsearch <[email protected]> wrote: >>> >>>> Hi All, >>>> >>>> I am new to ES and I have the following requirement: >>>> I need to specify a list of strings as a filter that applies to a >>>> specific field in the document. Like what a filter does, but instead of >>>> sending them on the query, I would like them to be populated from an >>>> external sources, like a DB or something. Can you please guide me to the >>>> relevant examples or references to achieve this on v1.1.2? >>>> >>>> Thanks, >>>> Sandeep >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/0093d97d-0f47-48e9-ba19-85b0850eda89% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/0093d97d-0f47-48e9-ba19-85b0850eda89%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/513172cd-9507-4e96-b456-498c98c3b8c9%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/513172cd-9507-4e96-b456-498c98c3b8c9%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f2ec45c7-8980-4005-9e1b-fc9a6aa422e0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
