[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054162#comment-13054162 ]
Mike Sokolov commented on LUCENE-3234: -------------------------------------- I did go back and look at the original case that made me worried; in that case the "bad" document is 650K, and the matched term occurs 23000 times in it. The search still finishes in 24 sec or so on my desktop, which isn't too bad I guess, considering. After looking at that and measuring the change in the test case in the patch as the number of terms increase, I don't think there actually is an n^2 - just linear, but the growth is still enough that the patch has value. The test case in the patch is closely targeted at the method which takes all the time when you have large numbers of matching terms in a single document. > Provide limit on phrase analysis in FastVectorHighlighter > --------------------------------------------------------- > > Key: LUCENE-3234 > URL: https://issues.apache.org/jira/browse/LUCENE-3234 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Mike Sokolov > Attachments: LUCENE-3234.patch > > > With larger documents, FVH can spend a lot of time trying to find the > best-scoring snippet as it examines every possible phrase formed from > matching terms in the document. If one is willing to accept > less-than-perfect scoring by limiting the number of phrases that are > examined, substantial speedups are possible. This is analogous to the > Highlighter limit on the number of characters to analyze. > The patch includes an artifical test case that shows > 1000x speedup. In a > more normal test environment, with English documents and random queries, I am > seeing speedups of around 3-10x when setting phraseLimit=1, which has the > effect of selecting the first possible snippet in the document. Most of our > sites operate in this way (just show the first snippet), so this would be a > big win for us. > With phraseLimit = -1, you get the existing FVH behavior. At larger values of > phraseLimit, you may not get substantial speedup in the normal case, but you > do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org