I'm not intimately familiar with FVH myself, but that sounds reasonable. Tests usually don't lie. I'd definitely like to see a patched version that avoids that!

Itamar.

On 22/06/2011 05:29, Michael Sokolov wrote:
OK - it seems as if there is a blow-up in FieldPhraseList if a document has a large number of occurrences of a term that is in the query. In one example, I searched for "1", and this occurs just under 2000 times in one of my test documents (as the value of HTML attributes). Admittedly a weird case, but when this happens, the highlighting can take 300x longer than when searching for a more distinctive term (like "distinctive").

I think there may be a problem here in that every term occurrence is compared against every other term occurrence (or every "phrase" within which the term may occur - I think?) so there is an n^2 growth factor in the number of occurrences of a term in a document. Does that seem possible?

-Mike

On 6/21/2011 8:48 PM, Michael Sokolov wrote:
I did that, and the benchmark indicates FVH is 10x faster than Highlighter now. I ran with a subset of the wikipedia data since I didn't want to deal with the whole thing. I'm trying to reconcile these weirdly varying results. One difference is that the benchmark doesn't use PhraseQueries - I added those and it did make FVH slightly slower, but not all that much. I'll keep digging.

-Mike

On 6/20/2011 10:54 PM, Michael Sokolov wrote:
Koji- I'm not familiar with the benchmarking system, but maybe I'll see if I can run that benchmark on my test data as a point of comparison - thanks for the pointer!

-Mike

On 6/20/2011 8:21 PM, Koji Sekiguchi wrote:
Mike,

FVH used to be faster for large docs. I wrote FVH section for Lucene in Action and it said:

In contrib/benchmark (covered in appendix C), there’s an algorithm
file called highlight-vs-vector-highlight.alg that lets you see the difference between two highlighters in processing time. As of version 2.9, with modern hardware, that algorithm shows that FastVectorHighlighter is about two and a half times faster
than Highlighter.

The number was for Lucene 2.9 age and Wikipedia data, maybe different today.

Anyway, thank you for sharing interesting result!

koji


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to