I'm not intimately familiar with FVH myself, but that sounds reasonable.
Tests usually don't lie. I'd definitely like to see a patched version
that avoids that!
Itamar.
On 22/06/2011 05:29, Michael Sokolov wrote:
OK - it seems as if there is a blow-up in FieldPhraseList if a
document has a large number of occurrences of a term that is in the
query. In one example, I searched for "1", and this occurs just under
2000 times in one of my test documents (as the value of HTML
attributes). Admittedly a weird case, but when this happens, the
highlighting can take 300x longer than when searching for a more
distinctive term (like "distinctive").
I think there may be a problem here in that every term occurrence is
compared against every other term occurrence (or every "phrase" within
which the term may occur - I think?) so there is an n^2 growth factor
in the number of occurrences of a term in a document. Does that seem
possible?
-Mike
On 6/21/2011 8:48 PM, Michael Sokolov wrote:
I did that, and the benchmark indicates FVH is 10x faster than
Highlighter now. I ran with a subset of the wikipedia data since I
didn't want to deal with the whole thing. I'm trying to reconcile
these weirdly varying results. One difference is that the benchmark
doesn't use PhraseQueries - I added those and it did make FVH
slightly slower, but not all that much. I'll keep digging.
-Mike
On 6/20/2011 10:54 PM, Michael Sokolov wrote:
Koji- I'm not familiar with the benchmarking system, but maybe I'll
see if I can run that benchmark on my test data as a point of
comparison - thanks for the pointer!
-Mike
On 6/20/2011 8:21 PM, Koji Sekiguchi wrote:
Mike,
FVH used to be faster for large docs. I wrote FVH section for
Lucene in Action and it said:
In contrib/benchmark (covered in appendix C), there’s an algorithm
file called highlight-vs-vector-highlight.alg that lets you see the
difference
between two highlighters in processing time. As of version 2.9,
with modern hardware,
that algorithm shows that FastVectorHighlighter is about two and a
half times faster
than Highlighter.
The number was for Lucene 2.9 age and Wikipedia data, maybe
different today.
Anyway, thank you for sharing interesting result!
koji
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org