Sorry for the weird reply path, but I couldn’t find an easy reply method via the list archive.
Anyway … The use case is as follows: Allow the user to specify queries such as ‘free*’ and also include similar words to be ignored, such as freedom. Another example would be ‘secret*’ and secretary. I want to keep the ignore words separate so they apply to all queries, but then realized the ignore words should only apply to relevant (matching) queries. I don’t want the users to be required to add ‘and not WORD’ many times to each of the listed queries. David Shifflett From: Diego Ceccarelli Could you please describe the use case? maybe there is an easier solution From: "Shifflett, David [USA]" <shifflett_da...@bah.com> Date: Tuesday, July 9, 2019 at 8:02 AM To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> Subject: How to ignore certain words based on query specifics Hi all, I have a configuration file that lists multiple queries, of all different types, and that lists words to be ignored. Each of these lists is user configured, variable in length and content. I know that, in general, unless the ignore word is in the query it won’t match, but I need to be able to handle wildcard, fuzzy, and Regex, queries which might match. What I need to be able to do is ignore the words in the ignore list, but only when they match terms the query would match. For example: if the query is ‘free*’ and ‘freedom’ should be ignored, I could modify the query to be ‘free*’ and not freedom. But if ‘liberty’ is also to be ignored, I don’t want to add ‘and not liberty’ to that query because that could produce false negatives for documents containing free and liberty. I think what I need to do is: for each query for each ignore word if the query would match the ignore word, add ‘and not ignore word’ to the query How can I test if a query would match an ignore word without putting the ignore words into an index and searching the index? This seems like overkill. To make matters worse, for a query like A and B and C, this won’t match an index of ignore words that contains C, but not A or B. Thanks in advance, for any suggestions or advice, David Shifflett