OK - it seems as if there is a blow-up in FieldPhraseList if a document
has a large number of occurrences of a term that is in the query. In
one example, I searched for "1", and this occurs just under 2000 times
in one of my test documents (as the value of HTML attributes).
Admittedly a weird case, but when this happens, the highlighting can
take 300x longer than when searching for a more distinctive term (like
"distinctive").
I think there may be a problem here in that every term occurrence is
compared against every other term occurrence (or every "phrase" within
which the term may occur - I think?) so there is an n^2 growth factor in
the number of occurrences of a term in a document. Does that seem possible?
-Mike
On 6/21/2011 8:48 PM, Michael Sokolov wrote:
I did that, and the benchmark indicates FVH is 10x faster than
Highlighter now. I ran with a subset of the wikipedia data since I
didn't want to deal with the whole thing. I'm trying to reconcile
these weirdly varying results. One difference is that the benchmark
doesn't use PhraseQueries - I added those and it did make FVH slightly
slower, but not all that much. I'll keep digging.
-Mike
On 6/20/2011 10:54 PM, Michael Sokolov wrote:
Koji- I'm not familiar with the benchmarking system, but maybe I'll
see if I can run that benchmark on my test data as a point of
comparison - thanks for the pointer!
-Mike
On 6/20/2011 8:21 PM, Koji Sekiguchi wrote:
Mike,
FVH used to be faster for large docs. I wrote FVH section for Lucene
in Action and it said:
In contrib/benchmark (covered in appendix C), there’s an algorithm
file called highlight-vs-vector-highlight.alg that lets you see the
difference
between two highlighters in processing time. As of version 2.9, with
modern hardware,
that algorithm shows that FastVectorHighlighter is about two and a
half times faster
than Highlighter.
The number was for Lucene 2.9 age and Wikipedia data, maybe
different today.
Anyway, thank you for sharing interesting result!
koji
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org