Simon Willnauer created LUCENE-4628:
---------------------------------------

             Summary: Add common terms query to gracefully handle very high 
frequent terms dynamically
                 Key: LUCENE-4628
                 URL: https://issues.apache.org/jira/browse/LUCENE-4628
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/other
            Reporter: Simon Willnauer
            Priority: Minor
             Fix For: 4.1, 5.0


I had this problem quite a couple of times the last couple of month that 
searches very often contained super high frequent terms and disjunction queries 
became way too slow. The main problem was that stopword filtering wasn't really 
an option since in the domain those high-freq terms where not really stopwords 
though. So for instance searching for a song title "this is it" or for a band 
"A" didn't really fly with stopwords. I thought about that for a while and came 
up with a query based solution that decides based on a threshold if something 
is considered a stopword or not and if so it moves the term in two boolean 
queries one for high-frequent and one for low-frequent such that those high 
frequent terms are only matched if the low-frequent sub-query produces a match. 
Yet if all terms are high frequent it makes the entire thing a Conjunction 
which gave me reasonable results as well as performance. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to